Psychometrika 


VOLUME XIV —1949 
JANUARY-DECEMBER 





Editorial Council 


Chairman:—L. L. THURSTONE Managing Editor:— 


Editors: —A. K. Kurtz HAROLD GULLIKSEN 
M. W. RICHARDSON Assistant Managing Editor:— 
DoRoTHY C. ADKINS 


Editorial Board 


R. L. ANDERSON CHARLES M. HARSH P. J. RULON 

H. S. CONRAD PAUL HORST WM. STEPHENSON 

ELMER A. CULLER ALSTON S. HOUSEHOLDER S. A. STOUFFER 

E. E. CURETON TRUMAN L. KELLEY GODFREY THOMSON 

JACK W. DUNLAP ALBERT K. KURTZ L. L. THURSTONE 

MAX D. ENGELHART IRVING LORGE LEDYARD TUCKER 

HENRY E. GARRETT QUINN MCNEMAR ROBERT J. WHERRY 

J. P. GUILFORD CHARLES I. MOSIER S. S. WILKS 

HAROLD GULLIKSEN FREDERICK MOSTELLER HERBERT WOODROW 
M. W. RICHARDSON 





PUBLISHED QUARTERLY 


By THE PSYCHOMETRIC SOCIETY 
AT 23 WEST COLORADO AVENUE 
COLORADO SPRINGS, COLORADO 








SCIENCE 
LIBRARY 


BE 
ba 


} 














Psychometrik 





CONTENTS 


A COMPARATIVE STUDY IN MULTIPLE-FACTOR ANALY- 
SIS OF “NEUROTIC” TENDENCY - - - - - 


I. D. MacCRONE AND A. STARFIELD 


A GENERALIZED EXPRESSION FOR THE RELIABILITY 
OF MEASURES - - - - - - = = = = 


PAUL HORST 


A FACTORIAL APPROACH TO JOB FAMILIES - - - - 
CLYDE H. COOMBS AND GEORGE A. SATTER 


NOTE ABOUT THE MULTIPLE GROUP METHOD - - - 
L. L. THURSTONE 


A STATISTICAL CRITIQUE OF THE USAFI TESTS OF 
GENERAL EDUCATIONAL DEVELOPMENT -_ - 


WARREN G. FINDLEY AND NEAL B. ANDREGG 


ON THE THEORY OF TEST DISCRIMINATION - - - 
GEORGE A. FERGUSON 


NOTE ON THE COMPUTATION OF PRODUCT-MOMENT 
CORRELATION COEFFICIENTS - - - - - 


DOROTHY C. ADKINS 
VAUGHN, K. W. (Editor). National Projects in Educational 
Measurement. 


CHAUNCEY, HENRY (Editor). Exploring Individual Differ- 
ences. 


re oat ek ae me te 


DOROTHY C. ADKINS 


21 


33 


43 


AT 


61 


69 


75 








VOLUME FOURTEEN MARCH 1949 NUMBER ONE 











NOTICES 


The Treasurer of the Psychometric Corpora- 
tion is authorized to accept a one-year gift sub- 
scription for any library in a devastated country 
for the calendar year of 1949 only for $3.00. 


The International Congress of Pedagogy will 
be held in Santander, Spain, during the second 
fortnight of July, 1949. The congress will discuss 
the principal problems and conceptions of modern 
pedagogy. Those interested in further informa- 
tion should write 


Secretaria del Congreso Internacional de 
Pedagogia. 

Instituto «San José de Calasanz». 

Serrano, 123. 

MADRID (Espana). 











PSYCHOMETRIKA—VOL. 14, NO. 1 
MARCH, 1949 


A COMPARATIVE STUDY IN MULTIPLE-FACTOR ANALYSIS 
OF “NEUROTIC” TENDENCY 


I. D. MACCRONE AND A. STARFIELD 


DEPARTMENT OF PSYCHOLOGY, UNIVERSITY OF THE WITWATERSRAND, 
JOHANNESBURG, SOUTH AFRICA 


The original aim of this investigation was to discover what re- 
lations exist between race attitudes and certain personality traits. 
For that purpose, a standard neurotic inventory and an attitude 
scale were applied to three dissimilar groups of subjects, and the 
results, obtained by multiple-factor analysis carried out indepen- 
dently for each of the three groups, compared with one another. 
The components of neurotic tendency, as displayed by each of the 
three groups, show a high measure of agreement and appear to be 
reconcilable with other interpretations of neurotic personality. So 
far as the original aim of the study is concerned, there appears to 
be a slight but significant tendency for the neurotic factor of hyper- 
sensiliveness to affect the race attitudes of Europeans towards the 
native in South Africa in a negative or unfavorable direction. 


I 

The original impetus to this study was provided by the attempt to 
discover whether any relation could be discovered between neurotic 
tendency and race attitudes. For this purpose, the Clark-Thurstone 
Inventory (8, 9) was applied to three groups of subjects—English- 
speaking South Africans, Afrikaans-speaking South Africans, and 
Jews—together with a scale for measuring attitude towards the na- 
tive. The subjects in question were all Europeans, all South-African 
born, all students in their first year of study at the University of the 
Witwatersrand, and all members of the male sex. 

It has for some time been a commonplace assumption among in- 
vestigators into race attitudes that the neurotic personality is more 
likely to display race attitudes in a more extreme form or that race 
attitudes, particularly of a negative or hostile nature, are more likely 
to be a function of the neurotic personality than of the more normal 
personality. If there is any validity in this hypothesis, then it would 
seem that a multi-racial society, such as we find in the Union of South 
Africa, should be able to provide some supporting evidence, especially 
when we bear in mind (a) the wide range of individual differences in 
race attitudes towards the native as displayed by members of the three 
European groups and as measured by the scale, (b) the marked race 
or “color” consciousness which finds expression in the behavior of so 


1 











2 PSYCHOMETRIKA 


many Europeans, and (c) the widespread inter-racial tensions and 
conflicts which are such a conspicuous feature of South African 
society. 

In actual fact, however, the race attitudes of the subjects, as 
measured by the scale, show very little impregnation by neurotic ten- 
dency, as measured by the inventory—a result which may be due in 
part either to the inadequacy of the instruments employed or to the 
relative immaturity of the subjects available for the investigation or 
to a combination of both. But, although the findings throw very little 
light on the influence of neurotic personality upon race attitudes, they 
may serve to provide significant evidence on the following points: 

(1) What factors or components are actually present in what 
is loosely described as “neurotic” tendency, or what are the charac- 
teristics of the “neurotic” personality as measured by the Clark- 
Thurstone Inventory? 

(2) How closely do the results of multiple-factor analysis, car- 
ried out independently on two or more groups of subjects, agree with 
one another, or how genuine are the neurotic factors actually located 
by multiple-factor analysis? 

(3) Does “neurotic” tendency or the “neurotic” personality dis- 
play the same composition in culturally dissimilar groups, or to what 
extent are differences in cultural background reflected in “neurotic” 
tendency displayed by subjects who are members of such culturally 
diverse groups as English-speaking and Africaans-speaking South 
Africans or Africaans-speaking South Africans and Jews? 

(4) What measure of agreement exists between the interpreta- 
tion of “neurotic” tendency based upon the findings of this study and ° 
the interpretation of “neurotic” personality provided by more direct, 
qualitative or clinical methods? 

(5) And, finally, which of the factors or components of “neu- 
rotic” tendency actually affect race attitudes towards the native, what 
is the extent of this effect, and are the same factors operative in the 
same way in the race attitudes of English-speaking, Afrikaans-speak- 
ing, and Jewish subjects? 


II 
The correlation matrices for the English-speaking, Jewish and 
Afrikaans-speaking groups are given in Tables 1, 2, and 3. Correla- 
tion values have in every case been condensed from three to two deci- 
mal places. Applying Thurstone’s method of factorizing a correlation 
matrix, five factors were extracted from each matrix. The resultant 
centroid matrices for the three groups are given in Tables 4, 5, and 6. 
For the purpose of giving a psychological interpretation to as 











I. D. M€aCCRONE AND A. STARFIELD 3 


many of the factors extracted as possible, the method suggested by 
Reyburn and Taylor (1) for rotating the axes was followed. Inspec- 
tion of the weight and direction of the loadings of the various items 
by the several factors as shown in the centroid matrix, combined with 
the need for making psychological sense of each factor after rotation 
of all the axes by means of successive orthogonal transformations had 
been completed, gave the three factorial matrices which are set forth 
in Tables 7, 8, and 9. 
III 

We are now in a position to identify the psychological compo- 
nents of neurotic tendency as measured by the Clark-Thurstone Inven- 
tory. It is clear that the neurotic tendency in question is a complex 
trait made up of several independent factors, and that the factors vary 
considerably among themselves in the extent to which they determine 
responses to individual items in the Inventory. Thus, if we take the 
group of English-speaking subjects, which gives the least ambiguous 
results, we find that the first four factors show the following signifi- 
cant item loadings ranked in descending order: 


Factor I 


21 0.830 Do you lack self-confidence? 
20 0.790 Are you self-conscious before superiors? 
7 0.780 Are you shy? 
16 0.648 At a reception or tea do you avoid meeting the important person 
present? 
5 0.606 Do you keep in the background on social occasions? 
1 0.542 Do you get stage fright? 
24 0.540 Do you feel inferior? 
18 0.529 Do you hesitate to volunteer in a class discussion or debate? 
9 0.493 Do you get discouraged easily? 
8 0.425 Do you day-dream frequently? 
11 0.421 Do you like to be alone? 
25 0.411 Is it hard for you to make up your mind until the time for action 
is past? 
15 0.404 Do you cross the street to avoid meeting someone? 
17 0.396 Do you often feel just miserable? 
18 0.386 Does it bother you to have people watch you work even when you do 
it well? 
0.380 Are you happy and sad by turns without knowing why? 
9.880 Do you worry over humiliating experiences? 
873 Does criticism hurt you badly? 
0.319 Are your feelings easily hurt? 
0.311 Do you ery easily? 
0.306 Are you often lonely? 
0.265 Are you self-conscious about your appearance? 
0.251 Do you say things on the spur of the moment and then regret them? 
0.232 Are you afraid of falling when you are on a high place? 


i 


ee 


me bo 
SCONON PAN 











4 PSYCHOMETRIKA 


This factor when interpreted from the neurotic end may simply 
be identified as Lack of Self-Assurance or as a generalized Feeling of 
Inner or Subjective Insecurity. It is a general factor showing sig- 
nificant but widely varying loadings on all the items of the Inventory. 


Factor II 
10 0.562 Do you say things on the spur of the moment and then regret them? 
6 0.430 Are you happy and sad by turns without knowing why? 
8 0.415 Do you day-dream frequently? 
9 0.414 Do you get discouraged easily? 
25 0.889 Is it hard for you to make up your mind until the time for action 
is past? 
17 0.370 Are you often just miserable? 
3 0.363 Are you afraid of falling when you are on a high place? 
19 0.336 Are you often lonely? 
4 0.826 Are your feelings easily hurt? 
24 0.301 Do you feel inferior? 
14 0.281 Does criticism hurt you badly? 
2 0.281 Do you worry over humiliating experiences? 


This factor when interpreted from the neurotic end may be iden- 
tified as Emotional and Conative Instability. It appears to be vir- 
tually identical with a factor described by Reyburn and Taylor in 
their analysis (4) of some of Webb’s data as well as in their further 
analysis (5) of the ten most diagnostic items of the Freyd-Heidbreder 
Introversion-Extraversion Inventory. These authors, interpreting the 
factor from the non-neurotic end, have given it the name of Persever- 
anee and add that it “differs from Webb’s ‘W’ chiefly by giving more 
weight to steadiness, continuity, and perseverance than to action from 
principle or purpose.” As it appears in the present context, the fac- 
tor in guestion reveals clearly enough the impulsive, uncontrolled 
character of the neurotic personality, its lack of inner or subjective 
balance that leaves it veering like a weathercock with every momen- 
tary or passing whim. 

Factor III 


2 0.566 Do you worry over humiliating experiences? 
14 0.545 Does criticism hurt you badly? 
4 0.542 Are your feelings easily hurt? 
19 0.251 Are you often lonely? 
22 0.220 Are you self-conscious about your appearance? 


This factor is quite straightforward. As a component of neurotic 
tendency it may simply be described as Hyper-Sensitiveness. 


Factor 1V 
22 0.673 Are you self-conscious about your appearance? 
12 0.559 Do you cry easily? 











I. D. MacCRONE AND A. STARFIELD 5 


24 0.306 Do you feel inferior? 
17 0.299 Do you often feel just miserable? 
19 0.276 Are you often lonely? 


This fourth factor completes the list of significant components of 
neurotic tendency as measured by the Clark-Thurstone Inventory. It 
appears to call up the picture of a socially maladjusted, self-centered, 
highly strung and rather unhappy individual. As a neurotic trait it 
may best be described as Morbid or Pathological Self-Consciousness. 


IV 

So much for the factors comprising neurotic tendency in one 
group of subjects. We now turn to a comparison between these fac- 
tors and the results obtained from the application of multiple-factor 
analysis to the data based upon the returns of two other groups of 
subjects who took the inventory and the scale at the same time and 
under the same conditions as the first group. Since this latter group 
had the largest number of subjects and since the neurotic factors that 
could be identified for this group all appeared to be psychologically 
significant, the problem of comparison resolved itself into an attempt 
to identify the same neurotic factors of the two remaining groups. 
The results of the attempt are set forth in Tables 8 and 9. 

Factor I, identified as Lack of Self-Assurance, appears as the 
same basic factor in neurotic tendency for all three groups of sub- 
jects—English-speaking, Jewish, and Afrikaans-speaking. If we as- 
sume that a loading of less than .2 for any item by a particular fac- 
tor has little or no statistical significance, we may conclude that this 
factor is virtually a general neurotic factor since all the items of the 
inventory show significant loadings for the English-speaking group, 
while for the Jewish group all the items but two, and for the Afri- 
kaans-speaking group all the items but four, show significant loadings 
on this factor. The agreement between the respective weights of the 
loadings for the different items, as measured by rank-order correla- 
tions between the results of the different groups, turns out to be very 
high, being .96 for the English-speaking and Jewish groups, .94 for 
the English-speaking and Afrikaans-speaking groups, and .93 for the 
Jewish and Afrikaans-speaking groups. 

Factor II, identified as Emotiona' and Conative Instability in the 
case of the English-speaking group, appears to coincide completely 
with Factor II in the case of the Jewish group. For the Afrikaans- 
speaking group the corresponding factor appears as Factor III; and 
here too the equivalence is very close although agreement with regard 
to the respective weights of the loadings for the different items is not 











6 PSYCHOMETRIKA 


very great. Item 12, “Do you cry easily ?”’, for example, which has the 
highest loading on Factor III for the Afrikaans-speaking group, has 
no statistically significant loading on the corresponding Factor II for 
either the English-speaking or Jewish groups. 

Factor III, identified as Hyper-Sensitiveness in the case of the 
English-speaking group, appears again as Factor III in the case of 
the Jewish group. Its nearest equivalent in the case of the Afrikaans- 
speaking group is Factor II, which, however, shows significant load- 
ings for a number of items which in the case of the English-speaking 
and Jewish groups are significantly loaded only on the previous factor. 

Finally, Factor IV, identified as Morbid Self-Consciousness in the 
case of the English-speaking group, may be regarded as approxi- 
mately equivalent to the fourth factor in the case of both the Jewish 
and Afrikaans-speaking groups. 

The results of comparison would appear to point to the conclu- 
sion that all the neurotic factors that have been identified are genuine 
and not artifacts, since in at least two out of the three groups whose 
returns were analyzed independently the equivalence between corre- 
sponding factors is in every case complete or very nearly complete. The 
divergence in the case of the third group on some of the factors, a 
divergence which is in the direction not of increased clarity but. of 
increased ambiguity in interpretation, may be adequately accounted 
for if we bear in mind (a) the smaller size of the group in question 
and (b) the fact that subjects of this group, although bilingual, are 
primarily Afrikaans-speaking and were obliged to take both inventory 
and scale through the medium of English. 

A further point which emerges from these results is that the neu- 
rotic tendency in question does not appear to be affected by differ- 
ences in cultural background such as distinguish English-speaking, 
Jewish, and Afrikaans-speaking groups in this country. While a good 
deal has been heard in recent vears of the influence of cultural differ- 
ences upon differences in personality make-up, such evidence as is pro- 
vided by this study points to the conclusion that the same components 
of neurotic tendency are present in all three groups. Such a conclu- 
sion does not, of course, exclude the possibility that there are other 
ways in which cultural differences between the groups in question may 
affect the individual personalities of their members. 


V 
In this section we propose to deal in a more qualitative way with 
the concept of neurotic tendency or of the neurotic personality. For 
this purpose, we may consider various interpretations of this concept 
which have been formulated by other investigators and more particu- 











I. D. MacCRONE AND A. STARFIELD 7 


larly by Karen Horney and by L. L. and T. G. Thurstone: It is a no- 
torious fact that both the concept of neurotic tendency and the con- 
cept of introversion-extraversion have become so ambiguous that they 
have virtually ceased to have any value for purposes of the-scientific 
analysis of personality. 

Flanagan, for example, in his factor analysis (2) of the Bern- 
reuter Personality Inventory, which purports to be a measure of not 
less than four independent personality variables, including neurotic 
tendency and introversion-extraversion, finds only two independent 
personality factors of which the first, “representing about 78 per cent 
of the non-chance variance,” is interpreted by him “as distinguishing 
between the self-confident, well-adjusted, socially-aggressive, ‘thick- 
skinned’ individual and the self-conscious, shy, emotionally-unstable 
individual.” This single factor, which is described by the author as self- 
confidence, may, in the light of our analysis, be regarded as the equiva- 
lent of at least three out of the four factors which were identified in 
neurotic tendency. Reyburn and Taylor, again, in their multiple-fac- 
tor analysis of introversion-extraversion referred to above, in addi- 
tion to identifying the factor of perseverance which we have already 
equated with one of our own, also identify a fourth factor which they 
describe as “a form of social toughness or tenderness” and which may 
be equated with the factor which we have described as hyper-sensi- 
tiveness. 

In her book, The Neurotic Personality of Our Time (3), Karen 
Horney gives in an introductory chapter a purely descriptive account, 
based upon “surface observation” or “what a good observer can dis- 
cover without the tools of psychoanalytic technique,” of the traits of 
the neurotic personality in our culture. We may assume that a psycho- 
analyst of Horney’s standing should be wel! qualified, as a result of 
her intimate experience with neurotic personalities, to provide us with 
an adequate analysis of neurotic traits based upon first-hand obser- 
vation. According to her account, the neurotic traits or “attitudes 
thus observable may be loosely classified as follows: first, attitudes 
concerning giving and getting affections; second, attitudes concerning 
evaluation of the self; third, attitudes concerning self-assertion; 
fourth, aggression; fifth, sexuality.” 

Since the inventory employed in this study included no items re- 
lating to the field of sexuality, we may omit any further reference to 
this particular type of neurotic trait. Of the remaining four men- 
tioned by Horney, the first and second are equivalent to what we have 
identified as the factors of hyper-sensitiveness and lack of self-assur- 
ance. Thus we are told: “As to the first, one of the predominant trends 
of neurotics of our time is their excessive dependence en the approval 








8 PSYCHOMETRIKA 


or affection of others...... They may feel hurt, for example, if some- 
one does not accept their invitation, does not telephone for some time, 
or even only if he disagrees with them in some opinion. This sensi- 
tivity may be concealed by a ‘don’t care’ attitude”; while with regard 
to the second we are told that “the inner insecurity expressed in this 
dependence on others is the second feature that strikes us in neurotics 
on surface observation. Feelings of inferiority and inadequacy are 
characteristics that never fail.” 

Of her two remaining sets of neurotic traits, namely, difficulties 
concerned with self-assertion and difficulties concerned with aggres- 
sion, we have found no directly equivalent factors in our own analysis. 
This may be due to the fact that, in the former case, there would seem 
to be little justification for Horney’s distinction in any independent 
way between lack of self-assurance or self-confidence and difficulties 
with regard to self-assertion, since the neurotic personality charac- 
terized by feelings of inner or subjective insecurity is more than likely 
to mect with such difficulties. With regard to the difficulties concerned 
with aggression, which would appear to be an important neurotic 
trait, we find no equivalent factor—a failure which, again, as in the 
case of’ sexuality, may be due to the absence of items in the inventory 
dealing directly with aggressive types of response. On the other hand, 
if we accept H. H. Anderson’s analysis of the concept of domination 
or dominative behavior (1), which in many ways appears not unlike 
Horney’s aggression described by her as “a propensity to be aggres- 
sive, domineering, over-exacting, to boss, cheat or find fault,” we may 
assume provisionally that this neurotic trait (in our culture), as in 
the case of difficulties with regard to self-assertion, is also a reflection 
of the more fundamental lack of self-assurance or self-confidence since 
Anderson, on the basis of his own observations of the behavior of 
young children, claims to have found evidence for the view that domi- 
nation is “the behavior of the insecure person.” 

On the other hand, we find nothing in Horney’s analysis compar- 
able to the important neurotic factor of emotional and conative insta- 
bility or to the neurotic factor more doubtfully identified as morbid 
self-consciousness. On the whole, it would seem that while there is 
nothing incompatible between the results obtained from a purely quali- 
tative analysis, such as represented by Horney’s, and those based upon 
multiple-factor analysis, the former cannot be accepted too readily at 
their face value. In evaluating the relative merits of the two ap- 
proaches, we do well to bear in mind Spearman’s dictum “that two 
legs are better than either alone.” 

L. L. and T. G. Thurstone, in giving a psychological interpreta- 1 
tion of the neurotic personality, suggest that “the fundamental char- ‘ 

















I. D. MacCRONE AND A. STARFIELD 9 


acteristic of the neurotic personality is an imagination that fails to 
express itself in external social reality...... The biological function 
of imagination is here regarded as preparation for action, and imagi- 
nation is here regarded as unfinished action. Its natural course is to 
complete itself in overt action. The neurotic personality is one which 
fails somehow in the relation between imagination and external social 
reality.” 

This interpretation, which attempts to find the common psycho- 
logical trait present in the most diagnostic items of the Thurstone 
Neurotic Inventory (6), is clearly derived from L. L. Thurstone’s 
functional theory of mind as formulated in his book, The Nature of 
Intelligence (7). Since most of these items reappear in the Clark- 
Thurstone Inventory, it becomes possible to test the Thurstone inter- 
pretation of neurotic personality by the results obtained from our own 
multiple-factor analysis. When this is done, it would seem that the 
fundamental characteristic of the neurotic personality according to 
this interpretation is equivalent to the general neurotic factor of lack 
of self-assurance. It is just this lack of self-assurance or feeling of 
inner insecurity that makes it difficult, if not impossible, for the neu- 
rotic personality to pass from the imaginative to the overt expression 
of impulse in behavior, or that accounts for the failure of imagina- 
tion in the neurotic personality to express itself effectively on exter- 
nal social reality. 


VI 

In this final section, we turn to a consideration of the relation 
between neurotic tendency and attitude towards the native. The fac- 
tor loadings for each of the four neurotic factors on attitude for each 
of the three groups of subjects are set forth in Table 10. 

The results for the three groups are in complete agreement, since 
they show that only one neurotic factor, that of hyper-sensitiveness, 
plays any part in affecting the race attitudes of the subjects. Although 
the per cent of variance due to this factor is very small, being only 6 
per cent for the English-speaking group and 10 per cent for the Jew- 
ish and Afrikaans-speaking groups, it may be regarded as having some 
significance, especially in view of the complete absence in all the 
groups of any effect upon race attitudes due to any of the other three 
neurotic factors. Since high scores on the scale are indicative of an 
unfavorable attitude towards the native, it may be concluded that, 
irrespective of group membership, hyper-sensitiveness in the individ- 
ual is likely to be found going along with a slight tendency towards a 
somewhat less favorable attitude towards the native. 











10 PSYCHOMETRIKA 


The complete absence of any relation between the neurotic factor 
of lack of self-assurance and race attitudes is surprising in view of 
the hypothesis so frequently advanced that it is the individual char- 
acterized by personal or subjective insecurity who is most likely as a 
member of a group to develop negative, hostile, or generally unfavor- 
able attitudes in a group-conflict situation towards members of groups 
other than his own. Especially in a multi-racial society, such as we 
find in South Africa, where there is so much color prejudice and inter- 
racial tension between white and black and where every European, in 
view of the marked disparity in numbers between white and black, is 
alleged to harbor fears (if only at the back of his mind) about the 
future security of the numerically inferior white population, it might 
seem reasonable to suppose that such widespread feelings of social 
insecurity, created by the “black menace” or by the “rising tide of 
color” or by the threat of “being swamped by the black,” would be 
reinforced by the individual’s own private feeling of personal or sub- 
jective insecurity. Actually, however, to judge from the available re- 
sults, there is no evidence at all to show that there has been any such 
mechanism of projection at work that would tend to stiffen the atti- 
tude of the more neurotic European towards the native. 





Editors’ Note: It has been pointed out by reviewers of this ar- 
ticle that it would be of interest if the findings were related to Guil- 
ford’s factors S, E, and M (Guilford, J. P., Personality factors S, E, 
and M, and their measurement, J. Psychol., 1936, 2, 109-127) and to 
Mosier’s finding of highly similar factors in a still different cultural 
milieu (Mosier, C. I., A factor analysis of certain neurotic symptoms, 
Psychometrika, 1937, 2, 109-127). Reference to these factors and re- 
conciliation of surface differences by rotation should provide neat con- 
firmation that factor pattern is invariant under wide changes in popu- 
lation, since personality items are presumably dependent on cultural 
background. 


REFERENCES 

1. Anderson, H. H. Domination and social integration in the behavior of kinder- 
garten children and teachers. Genet. Psychol. Monog., 1939, 21, 287-385. 

2. Flanagan, J. C. Factor analysis in the study of personality. Stanford Uni- 
versity, California: Stanford University Press, 1935. 

8. Horney, K. The neurotic personality of our time. London: Kegan Paul, 1937. 

.4 Reyburn, H. A. and Taylor, J. G. Some factors of personality. Brit. J. 
Psychol., 1939, 30, 151-165. 

. Factors in introversion and extraversion. Brit. J. Psychol., 1941, 
31, 335-340. 

6. Thurstone, L. L. and T. G. A neurotic inventory. J. soc. Psychol., 1930, 1, 
1-30. 











SS 





I. D. MacCRONE AND A. STARFIELD 11 


Thurstone, L. L. The nature of intelligence. London: Kegan Paul, 1924. 
Willoughby, R. R. Norms for the Clark-Thurstone Inventory. J. soc. Psychol., 


1984, 5, 91-97. 
Some properties of the Thurstone Personality Schedule and a sug- 


gested revision. J. soc. Psychol., 1982, 3, 401-424. 











*poz}IUIO IB szUIOd jBUITeCqT 
*ALOPUDAUT S10}SANY]-YAIVIO ey} UO suis} [ENplArpul—‘oy ‘Pp ‘g ‘Z ‘T 
‘OAIWVN 94} SPAVMOT, epnziz}y Surinsvspy Joy apeos—y]., 





G2 13 Ge Te OC GE 8T LT OT ST FE ST Gt TT OF 6 8 L 9 F< FV & @ FT I 


0¢ Vo LH LT ZG 02 2 82 62 82 ST OT @T && OF BE BE NE GZ 96 62 ES OF TO- GS 
6b 9G TE OF SE OF GE FS TE FT BE ST OF EF GE EG TE 62 86 3 93 VE GO V2 

gé cO 98 OT FE OT VS CE TO GS 80 TE 32 GZ 02 ST OE LE 0 2 GT VO Zs 

68 To LE 9S LY TE 6& TE & 96 12 LE 8& FE GS TE LE 06 SE OF 90- TS 

sé && GZ 8& 8ST FT 8T 82 OF & WH GE O09 GW BE SS TS LE BF 90 06 

oo 6S& ST «TS «6h «6ST 688 «(00 :«682~«C0E:«€CTS CLS] SCOOT «CST «(OS «TT «66% «68 «=(0- 6T 

tZ 36 66 662 GS LI LT &f 3 OT 2S 62 OF LT FE 0% 8S ZO- 8T 

ve 86 V6 6E GZ 66 6T 98 96 SB OV ST GT 6 3 82 00 LT 

















< 0€ 82 SE TZ FE TZ 8B 6Z TS ZE Sh FZ GI GZ 6E TO- 9T 
= GZ St 2 6E IT 92 8T 08 8& FZ 1% FE 63 LZ FO ST 
S LE ST FE GZ LE LE OT &Z OZ LH SI BF TE OT FT 
fa ZO 9T PL ST LE 28 PT 6T IZ IT LB LZ 60- ST 
A 02 $2 OF 80 6T GI FT OL FE FZ OT ZO- Zt 
Se IT 9@ 18 TE GE LE SI 60 FO FT 10 TT 
~, cg 98 12 08 LO &% LZ 9T SE F0O- OT 
iy 98 68 LZ 93 LZ 83 93 TS 90 6 
ve 2 GZ 88 FI 83 92 0-8 
6€ 0S GZ ST 62 98 T0-L 
LI 62 02 08 8I IT 9 
91 90 90 TE 80 § 
6T €9 OT &I F 
60 9T ZO & 
LZ 6T 2 
OI- I 
I 
93 0S Zo IS 02 GI ST LT OT ST FT SE SE.IT OT 6 8 L 9 § & & BTL its 
LOT = N suzy yynog Suryveds-ystjsuq 
N XII}e. UOTZBla1105 


T @TaVioL 





13 


I. D. MacCRONE AND A. STARFIELD 





*pe}]WO e1¥ sj}UjOd jeUT}Ieq 
*AIOPUSAUT BUOWSANY-YIV[Q Sy} uo suiozt pENplAIpuy—'ow ‘Pp ‘g ‘Z ‘T 
‘OATJEN 04} SPIBMOT, epnzzy Surnsvay 10 epeog—Ie 











dl Se eo ae Se Se ee oe oe ee er ee ee eee ee | 

99 99 98 €&L 69 Lb 62 6E SE GS FH OF BT TS 6E GG 9E OF LZ OT BS SI 93 9E TO GZ 
98 62 HS Cf LO GS 02 GSE GE GE LZ ET LO- OL Gh FT FS 6O 6Z LE TL 8 FS -FO- FZ 

66 6G 8b 98 OT 62 GE LO 93 8E G2 CE 6T TH & LZ 82 GO-9T 60 82 93 LO- & 

98 6b 87 FZ FS GE 90 SE OT FO LO GT LE 93 98 80 SZ FS TT 82 O08 ZO- 2 

Tg 8&8 62 OF OF 62 OF TE 80 8 92 99 GE LH FE FT GE OL FE Sr OO TZ 

€& Ge 0& Gh Lo S&F OF LT TE & TS GS SE 2 02 FZ 60 8B FE 80 02 

st py OF 90 TE TS 60 LO & Zh FZ 62 TT 00 OF EO OT GS 80 GET 

og OF ST 96 VE OT 02 VE GE & LE OT LE LIE 80 LZ 2 TO 8ST 

6t 93 6& GO OT 6T GZ 9F TE 6E 98 LO- TS OC LT S&T 60- LT 

ve cb Gh 02 3 FS OF & OF 00 FZ ST 6O LE LZ FO OT 

va O0& FO-ST LO GE TE 92 OT FS TT TO $3 OF ST ST 

sb 66 06 Gh OF Gh 98 BT LI S&F TZ 63 SS 80 FT 

9t TE 8t 86 LE c& GO OT TT ST 33 TE OT &T 

90- 9T LE 80- Sf TE 20-62 90- 62 LZ PFI- ZI 

sb St 9T ST OT 93 80 TE TO- 9 €0 TT 

vé OF & 8 8S OT LI LO ST &80- OT 

02 98 VE FT LE TO- 98 98 FO 6 

66 G& LT 18 L2 FE 1 LO- 8 

St Ty 62 60 LZ 68 60-21 

60 9T 90 L2 62 80 9 

$0 02 &f L&E 3O-s 

86 82 ct ST PF 

032 60 GZ & 

ve 90 @ 

II- T 

I 

es eS ee i ee oe ee oe oe oe oe a oe oe, oe 











vIT = N Shoe 





XLIZB. UOTzelar10D 
6 WIdViL 














PSYCHOMETRIKA 


14 


*pa}}{WO a18 syulOd peuIeq 


*£LOJUPAUT SUOISINY]-YAVIO ey} UO sum} [ENplAipuy—‘oy ‘p ‘g ‘Z ‘T 


‘OAIJBN AY} SPARMOT, epnyiiy Sutmnsvayl 10j apeog—], 











Go to 8S 3 TS OC 6T 8t LEI OT ST PE ST GE TT OF 6 8B LY FG FY & @ TF IT 

62 3h St Sh Z& LI &€9 OT S&S OF TE Ct LO GO TZ OF OF OZ GST TT HE LE SE TS ST GS 
Le 68 8ST 68 90- 8T £0 8& O& TO OT 2t LT TS St TO 96 &T GE 00 TO ZI FT 90- ¥ 

ct TE ot TE TO- LE Lo TE ST SE 82 FT Gl VE LO GS 6O-8T TO- 6 16 8Tt 80 &@ 
L0 13 136 €@ ST ST OT OF GO ST &% LI Tt LO- && 60 FE 02 £0 80 00 ZI 2 

$0 TIT 19 $0 12 St GO LO Gt TO- 70 FE 90 ST 80 SE 20-06 HH TE 00 TZ 

9t TE W@W FZ 82 Zt 60 TI GO 90- FT 10-68 GO 93 FZ 80 OT BT TE 02 
LO 8 LO 86 &% 90 GE 92 VE GO LE 90- GE VO 06 GO 8T 60 LO 6I 

70 8&8 00 20 PE GST 90 FE 2 60 6E FO 06 FO FS BE GE TO- 8T 

6 68 TE 90 68 OL 92 TE 92 GB H& BT LE 96 LE OT FO LT 

82 86 82 FZ SL Gt 3 LO LT GE 8&8 CO ZE 82 92 TO- OT 

0¢ cO 6T 00 WW TO LO-Z& LZ GS TS 6T 62 WO 8T ST 

3o 60 02 St OF 80- TT 72 90 VE LE FZ 80-02 FI 

TOo- $0 72 80 GO £0 FO St Tt 60 ST FT 80 S&T 

ot thy 98 VE FE WH 86 GS LE FE 8S OT ST 

8T &% 062 2 322 8t TE 20 2 ST 10 TT 

sé 02 $0 S& OF FI 2O- 98 LIT FI OT 

oo LG Sh 2 OT 62 OF LE LT 6 

6—T 23 0@ TT 60 62 02 10-8 

6T OF TE 8T OT 93 2 L 

ét T3 LE 86 TZ 90 9 

vE 06 W OF GO G 

L@ 06 61-20 F 

0€ 9T 20-8 

ly 60 2 

80- T 

: I 

Sw SsehBReeEeEeEtHrerA a SHE eee € = be Sf FS Ge UR Ue 








P8==_—-N Suvolyy yynog Zuryvedg-sueeyuyy 
XljeW Uoryeyer109 
§ ATAVL 











I. D. MacCRONE AND A. STARFIELD 


TABLE 4 


Centroid Matrix 
English-Speaking South Africans 








I II Ill IV V 





CONOR WON HH 


042 -137 -095 137 162 
509 229 069 097 063 
527 -192 ~-264 347 134 
823 -095 -117 -205 -163 
509 -288 -267 289 106 
480 398 141 041 033 
520 -121 -037 -107 -238 
680 359 -019 -073 121 
582 -072 -164 -137 -160 
607 -130 -097 -235 076 
4381 -275 -152 -290 ~-135 
381 204 063 -173 -131 
423 -193 396 -077 267 
372 196 ~-134 239 = -223 
568 -211 -091 439 -125 
470 021 126 1138 =-118 
607 292 072 052 -161 
5387 -120 244 -152 -216 
492 220 049 063 «=-018 
506 -268 131 125 -082 
640 874 -223 -204 401 
749 306 -146 -156 204 
449 -327 468 069 264 
648 ~-105 138 -158 131 
529 -097 -141 -071 -128 


075 
330 
523 


507 
411 
355 
612 
359 
456 
387 
238 
451 
802 
584 
264 
487 
433 


368 
802 
742 
604 
493 
331 








15 


Se eae 


ste ee ven er ee 








PSYCHOMETRIKA 


TABLE 5 
Centroid Matrix 
Jews 


I II III IV Vv h? 


034 141 -131 192 232 129 
567 -267 -157 -244 -070 482 
467 182 -179 -250 166 874 
231 110 -358 171 189 258 
424 827 -102 -061 342 418 
3847 -450 -297 -173 066 445 
387 320 -058 -171 -3809 380 
599 —265 062 -117 18 464 
504 134 -164 -181 -168 359 
689 087 197 -088 -069 534 
10 463 155 -167 151 -219 337 
a1 299 -065 -212 090 -367 282 
12 280 217 = -123 031 111 154 
13 500 -160 -101 287 107 379 
14 663 178 -146 110 100 515 
15 391 -226 -074 -141 -178 261 
16 530 -263 -158 264 041 446 
17 465 234 245 -268 —-254 468 
18 522 -249 -199 -115 021 387 
19 494 202 360 -151 089 445 
20 695 -101 200 143 119 568 
21 714 -048 335 089 -266 703 
22 513 -017 23 -168 266 419 
23 559 114 264 256 -050 463 
24 561 -242 346 095 218 550 
25 749 059 323 230 3 =-151 745 











CON HO TP WD S&S 














I. D. MacCRONE AND A. STAXFIELD 


Africaans-Speaking South Africans 


TABLE 6 


Centroid Matrix 














I II III IV Vv h? 

I 150 160 -073 162 172 109 
1 4038 -311 3847 -138 -178 430 
2 6381 -111 242 028 066 474 
3 3892 069 078 117 = -195 216 
4 335 346 -143 -017 127 269 
5 512 -183 -057 -268 -169 399 
6 473 384 267 038 -114 457 
7 500 040 -262 -466 -187 572 
8 275 149 279 + -218 -165 250 
9 554 114 238 011 075 382 
10 434 245 268 076 111 338 
11 318 179 -031 -184 070 157 
12 546 264 212 -123 -3138 526 
13 260 -172 019 180 161 156 
14 408 210 -181 440 203 478 
15 ATT 030 -352 816 -226 503 
16 583 -3803 -068 154-188 440 
17 ATT 372 111 146 -241 458 
18 481 -523 282 -147 392 760 
19 338 802 046 080 186 249 
20 394 -152 -363 -198 119 363 
21 428  -422 249 -020 140 443 
22 358 115 -327 -242 221 356 
23 401 -153 -248 167 -228 326 
24 404 -230 -386 -174 ~-112 408 
25 571 -261 -130 243 187 505 








17 








18 PSYCHOMETRIKA 


TABLE 7 
Factorial Matrix 
English-Speaking South Africans 


I II III IV Vv h2 


I -013 025 239 012 127 074 
1 542 010 065 046 -170 329 
2 380 231 566 -045 054 523 
3 232 863 -082 -051 -017 195 
4 819 826 542 -021 071 507 
5 606 -155 -048 069 -115 412 
6 380 430 008 043 = =-151 854 
7 
8 











780 003 -025 001 050 612 

425 415 019 + -065 042 359 

9 493 414 019 089 182 456 
10 251 562 -047 012 075 387 
11 421 108 -193 019 -107 238 
12 311 120 063 559 145 449 
18 886 041 174 -186 -293 301 
14 373 281 545 054 -252 582 
15 404 140 116 152 -210 263 
16 648 059 + -007 019 -252 487 
17 396 870 = -091 299 -200 431 
18 529 011 049 028 -117 297 
19 806 336 251 276 144 366 
20 790 -027 -023 -118 404 802 
21 830 100 -006 -057 199 742 
22 265 155 220 673 080 602 
24 540 301 037 806 118 491 
25 411 389 085 -029 -045 330 

















I. D. MacCRONE AND A. STARFIELD 











TABLE 8 
Factorial Matrix 
Jews 

I II III IV Vv h2 

I -020 -003 817 -079 146 128 
1 641 050 -062 -088 -240 483 
4 381 268 815 015 -238 3738 
3 185 075 858 —-281 089 255 
4 277 255 5138 102 044 417 
5 513 234 .-009 -246 -256 444 
6 241 558 -064 -007 -083 380 
7 656 -066 060 140 -079 464 
8 427 887 -028 -080 -140 359 
9 595 327 017 264 042 533 
10 869 87 019 -172 175 834 
11 804 222 -215 -294 088 282 
12 182 217 267 -029 036 154 
13 518 048 159 -108 268 379 
14 548 808 316 040 136 515 
15 454 058 --176 -O77 -123 262 
16 588  -083 079 += -183 231 446 
17 334 481 -144 801 -106 465 
18 591 -009 041 -134 -138 888 
19 867 279 111 469 002 445 
20 666 057 117 235 229 568 
21 654 286 -225 277 254 701 
22 476 036 205 3876 -081 418 
23 449 245 038 241 377 463 
24 591 -149 070 370 184 547 
25 645 310 060 286 388 748 








19 








20 PSYCHOMETRIKA 


TABLE 9 
Factorial Matrix 
Africaans-Speaking South Africans 


I II III IV V h2 

I -002 3821 -066 -020 -040 109 
1 5387 —144 275 -159 141 430 
2 578 269 192 -169 055 475 
3 833 152 108 012 264 215 
oa 113 429 053 221 -144 269 
5 590 = -031 070 212 031 400 
6 212 422 468 088 078 456 
7 
8 
9 
10 











522 011 072 526 -132 572 

231 034 432 090 -022 250 

411 360 279 -078 -018 883 

231 417 816 -101 -038 339 
11 215 230 112 155 = -145 157 
12 377 245 492 244 148 526 
18 246 176 -132 -208 055 155 
14 109 645 -179 -032 129 478 
15 288 886 -205 265 402 506 
16 547 115 -099 001 843 440 
17 188 443 355 208 252 462 
18 693 -029 -050 -470 -234 759 
19 114 453 132 015 -113 249 
20 442 095  -282 235 —-155 363 
21 569 -017 014 -345 008 443 
22 296 243 -160 2938 -312 855 
23 357 195 -282 163 233 826 
24 484 -012 -263 320 044 408 
25 525 841 -273 -155 119 505 


TABLE 10 
Relation Between Neurotic Factors and 
Attitude Towards the Native 


Attitude Towards the Native 


Neurotic Factors Eng.-sp.S.A. Jews Afrik.-sp.S.A. 
Lack of Self-Assurance .......................sccsseseceeeeeeee -.013 -.020 —.002 
Emotional and Conative Instability -................ .025 —.003 —.066 
Hyper-Sensitiveness ..................-.-:::ssecsesseeeeeeeeeee .239 317 321 


Morbid Self-Consciousness ......................-0.--00--+- .012 -.079 —.020 














PSYCHOMETRIKA—VOL. 14, No. 1 
MARCH, 1949 


A GENERALIZED EXPRESSION FOR THE RELIABILITY 
OF MEASURES 


PAUL HORST 
UNIVERSITY OF WASHINGTON 


In certain situations it is important to obtain as many measures 
as possible, all presumably measuring the same function, for each 
of a group of persons. In general the number and source of the 
measures may vary from one member of the group to another. We 
take the mean of the measures for each person as the best estimate 
of the function for that person. The conventional formulas can not 
be used to determine the reliability of a set of means so obtained. A 
formula is developed which provides a unique estimate of the reli- 
ability of such a set of means. The formula is more general than 
some of the well-known reliability formulas, so that these formulas 
are shown to be special cases of the more general formula. 


The problem of evaluating proficiency on the job is basic to all 
programs of personnel administration and research. This is true in 
industry, education, government, and in all situations where the hu- 
man factor is important for the success of the enterprise. In many 
situations objective measures of proficiency are not feasible and then 
we resort to ratings which are more or less subjective. Examples of 
this procedure are merit ratings in industry and government, and 
ratings of instructors in educational institutions. It is recognized 
that one of the conditions to be satisfied by a proficiency evaluation 
program is that the ratings should be reliable or stable. Other things 
being equal, the more ratings available on each of a group of persons 
the more reliable will be the average ratings. 

One of the basic requirements of a proficiency rating program is 
that each person should be rated by the maximum number of persons 
qualified to rate him. When this requirement is satisfied the number 
of ratings available for each person rated will in general vary. In the 
case of students’ ratings of instructors, the number of ratings for 
each instructor is determined by the number of students who have 
been subject to his instruction. 

In any situation where we obtain as many measures as possible, 
all supposedly measuring the same function, for each of a group of per- 
sons to be evaluated, we may take the average of the measures for a 
person as an estimate of his standing for the particular function. In 


21 











22 PSYCHOMETRIKA 


general the number of measures will vary from one member of the 
group to another. The problem of determining the reliability of such 
a set of average measures is of considerable importance. The conven- 
tional methods for determining reliability are not readily applicable. 

One might, of course, divide the measures for each individual 
into approximately equal groups, calculate two averages for each per- 
son, correlate the pairs of averages and, by means of the Spearman- 
Brown formula, calculate the estimated reliability of the averages 
based on all the measures for each person. In general the question 
of how to divide the measures for each person is more troublesome 
than in the split-half method of determining test reliability. Here at 
least we have the same number of measures (items) for each per- 
son. Furthermore, each item retains its identity for all persons, so 
that once we have decided how to split the items for one person the 
method of splitting applies to all persons. When the measures or 
ratings do not in general come from the same items, raters, or instru- 
ments for all persons, the problem of how to split them becomes even 
more confusing. 

It seems desirable, therefore, to have a formula for computing 
the reliability of measures when the number of measures varies from 
one person to another and when the measurement sources or instru- 
ments are not necessarily all the same for one person as for another. 

Suppose now we have a number of measures all purporting to 
measure the same function on each of a number of persons. The num- 
ber of measures and the measurement sources or instruments are not 
necessarily the same for all persons in the sample. We take the mean 
of the measures for each person as the best estimate of the function 
for that individual. Conceivably we might process the measures be- 
fore averaging them in order to make measures from all sources more 
nearly comparable. We might specify, for example, that the meas- 
ures from each source have the same mean and standard deviation 
as those from any other source, irrespective of the number of cases 
from each source. 

But the formula we shall present makes no assumptions with ref- 
erence to processing of the data before averaging the measures for 
each person. However, by applying it to processed and raw data one 
may determine how any particular method of processing the data 
affects the reliability of the average measures. The formula is more 
general than some of the well-known reliability formulas, so that these 
formulas are special cases of the more general formula. 

First we shall present the general formula without proof and 
indicate by a suggested work-sheet layout how it may be applied to 
actual data. Later we shall develop the proof of the formula and also 




















PAUL HORST 23 


show that several well-known reliability formulas are special cases 
of it. 


I. The Formula and its Applications 
If we let 


N = the number of persons, 

mn; = the number of measures for person 7, 
M;= the mean of these measures for person 1, 

o; — the standard deviation of these measures for persjon 7, and 
oy — the standard deviation of the means for the N persons, 


the estimate of the reliability of the means, M; , is given by 





r= 1 ————_-. (1) 
ou" 
The computations required by the formula may be conveniently 
carried out as indicated by the following column headings of a work- 
sheet: 












































| | Xd | | xf | | rh 
+—___ 
There will be a row of computations as indicated above for each per- 
son. For any person the headings have the following meanings: 





= number of ratings 
b= sum of ratings 
c= sum of squares of ratings 


d=b-a 
_ e=c-a 

f=a 

g=e-—f 


h=g> (a—1). 














24 PSYCHOMETRIKA 


After the computations have been performed for each person, the 
sums of the entries in each of the columns d, f , and h are calculated. 


Then if we let 





Sd=A 
Sf=B 
Sh=C 
it can easily be shown that formula (1) can be written 
C 
r=l1— ‘ 1 
rr (1a) 
B eee ee 
N 


Formula (1a) should be used for computational purposes when the 
work-sheet layout suggested above is used. 

It should be pointed out that cases involving only one measure 
can not be included in the computations. This is clear from column h 
of the work-sheet where the divisor term is n — 1. The quantity 
would be zero in the case of only one measure. The numerator term 
would also be zero. Hence the ratio would be indeterminate. 


II. Mathematical Proof of the Formula 
Formula (1) is based on the well-known generalized formula for 


the reliability coefficient 
2 


Pah 
Ce 


where o-? is the error variance and o,? is the observed variance of 
the measures. The observed variance is obviously cy’. In the develop- 


oi? 
: 2 nu 1 
ment which follows we shall justify the use of the expression — 





as an estimate of the error variance. 

Suppose first that it were possible to obtain m sets of compar- 
able mean measures on each of the N persons such that n; , the num- 
ber of measures for each meggfor person 7 , remains constant through- 
out the m sets of means. We would then have a matrix of means as 


follows: : Beebe iy 


M,,; Mi. iene Min 
M2, M, oe Mom 
itn ft ou } (1) , 














PAUL HORST 25 


where M;; is the mean for person / taken from the 7’th set of means. 
We assume that the sets of means are all comparable in the sense that 


1. the means of all columns of means are equal; 
2. the standard deviations of all columns of means are equal; 
3. the intercorrelations among all columns of means are equal. 


The correlation between two sets of means would be the reliability 


of a set of means. 
To simplify the derivation, we assume without loss of generality 


that each column of means is given in terms of standard measures 
so that the mean of any column of means is zero and its standard 


deviation unity. 
We shall now prove that the reliability of a column of means as 


m approaches infinity is given by 


N 
> oui® 
i=1 


N 
PES Joe mere, (2) 


m 


N = Mi? 
= m N 


First we note that the expression in the numerator of the second 
term on the right of (2) is 














i=1 1=1 m m 
N N = 
We let ¥ 
N ™ 
A=( ; 3 Mit) /N, (4) 
l=1 t-1 
N m 2 
B=3 (3M) IN. (5) 
l=1 i=1 
Then from (8), (4), and (5), (2) can be written 
a $s 
m m \ 





,a,1 


{ 


/ * 
| fared” 











26 PSYCHOMETRIKA 


or 
B 
r=—. 6 
mA (6) 
But the squared term in parentheses in (5) represents for each 
person an expression of the form 


M,2 + M, M, ap aia + M, M,, 


+ M, M, + M.?2+--»+M,M, f 
(7) 


+ M,, M, -++ M,, M. Se i M,,2 . 





According to (5), B is a summation divided by N for all persons, of 
expressions such as (7 ‘A Remembering that the M’s are in standard 
units we have 


Figg + 1 + +++ + Me 


L it Faq Hoos + Tay 
5 = 





or, since all the 7’s are assumed equal, 


B=m+m(m—1)r. (8) 
Likewise, because the M’s are in standard units, we have 
A=. (9) 


Substituting (8) and (9) in (6), 


_m+m(m—i1)r 





m? 
or 
1—r 


n~ 
‘ 


+r, (10) 





m 
As m approaches infinity (10) approaches an identity; hence (2) is 
proved. 


Actually, of course, we would have only a single column of 
means, together with the variance for each mean. Therefore, we 
would have to estimate ** for each mean in equation (2). The best 


estimate is well known to be 


ee | (11) 

















PAUL HORST 27 


Furthermore, for the denominator term in (2) we take as an esti- 
mate of the average variance of the m sets of means simply the vari- 
ance or, of the observed set of means, or 








Nm 
> > MM? 
t=1 i=1 
—_——— = oy’. 12 
aca oy (12) 
Substituting (11) and (12) in (2) gives 
N G77 
N 
fa / 
r=1— : (13) 


ou 


Equation (13) is the generalized reliability formula which we set out 
to derive. 

Formula (18) then gives us a general expression for the reli- 
ability of a set of means, where the mean of each individual is de- 
rived from a number of measures of the same function and the num- 
ber of measures may vary from one individual to another. 

It should be emphasized that the correctness of the formula does 
not depend upon the assumption that an extremely large number of 
comparable measures is collected for each subject. It does, however, 
depend on the assumption that equation (11) gives the best estimate 
of the error variance of the mean. 

The formula, of course, makes no assumptions as to the source 
of the measures which enter into its computation. If, in particular, 
the measures are ratings, then the influence of collusion or hearsay 
might cause a spuriously high estimate of reliability just as in the 
special case where two raters’ ratings on the same persons are corre- 
lated to get an estimate of the reliability of the ratings. 


III. Well-Known Special Cases of the General Formula 
The formula is of particular interest because several of the well- 
known reliability formulas can be derived from it as special cases. 
Consider, for example, the formula 


os nr 
1+ (n—1)r 


This is the well-known. general expression for estimating the reli- 
ability of the lengthened form of a test where 


vr = the reliability of the original test; 
nm =the amount by which the test has been lengthened. 





(14) 


Tn 


& 











28 PSYCHOMETRIKA 


As a special case we assume that the measures in formula (13) aren 
comparable forms of a test, and that these are given in terms of 
standard scores. In this case, n; is a constant for all individuals, so 
that (13) can be written 








N 
> oi? 
— 
r= 1————__. (15) 
(n — 1) oxy? 
But 
> 2? a2: \* 
oi” — - — : ’ (16) 
n n 
where the summations are over the scores for individual 7; 
N N 
> M?? >: \? 
eile dass 17 
Tl oN N vid 


where now the summations are over the mean scores for the N indi- 


viduals. 
But © M = 0 since we are assuming standard scores. Furthermore, 











s Xi 
M,=- 
n 
Therefore we can write (17) 
ou’ = N . (18) 


Substituting (16) and (18) in (15), we have 


a) 


=1— ws : (19) 


(n—1)% _ 


N 
































PAUL HORST 29 


Remembering that we are dealing with standard scores, it can be 


shown that 
Bice ae, 








N =i, (20) 
2(3 
" n _ntn(n—I1)r 
N = sas : (21) 


Substituting (20) and (21) in (19) gives 
1 [n + n(n—1)7] 





n? 














r=1— (22) 
[n +n —1)7] 
n—1 
n? 
or 
eves (n—1) — (n—1)r 
(n—1)[1+ (n—1)7] 
1—r 
=l1— 
1+ (n—1)r 
or 
nr 





es , 
1+ (m—1)r 


which is the same formula as the well-known formula (14). 

We can also show that the Kuder-Richardson formula (21)* for 
estimating the reliability of a test is a special case of formula (13). 
The Kuder-Richardson formula (21) is 


n 2—_npq 
Cee (* 7a) ’ (23) 
n—1 or 





where 
1+, = estimated reliability of the test, 


m= number of items in the test, 
o;? = the variance of the test, 


_ ._ *Kuder, G. F, and Richardson, M. W. The theory of the estimation of test re- 
liability. Psychometrika, 1987, 2, p. 158. 











30 PSYCHOMETRIKA 


p = the average difficulty of the items, and 
q=1-p. 
First we may point out that for computational purposes (23) 
may be simplified. If we write 
M; 


p=— 


n 


’ 


where M; is the mean of the test and 


re M; 
q=—1-—-—, 
n 


then (23) can be written 


1 (.-= "| (24) 


tA a—1 
Although formula (24) has never been published to the writer’s 
knowledge, it is actually more convenient to use than (23), since only 
the number of items, the mean, and the standard deviation are re- 
quired t6 estimate the reliability of the test. 

In estimating the reliability of a single test, we begin by regard- 
ing each item in the test as a measure of a person’s ability in the func- 
tion to be measured. He gets a score of either 1 or 0 for each item ac- 
cording to whether he answers it correctly or incorrectly. Hence, the 
measures for a given person will’consist of either 1 or 0. We assume 
that each person attempts every item so that the number of measures 
for each person is equal to 7, the number of items in the test. Going 
back to equation (16), we note that both the sum of the scores and 
the sum of their squares are simply the total number of items an- 
swered correctly. Therefore, if we let S; be the total number of items 
answered correctly by person 7, we write instead of (16) 


or? 


S; si \? 
ot = —( a= ) . (25) 
n n 
We also have 
S; 
M,=— (26) 
n 
and 
Si 
M= 25s (27) 
Nn 


Substituting (25), (26), and (27) in (15) gives 

















PAUL HORST 31 


3-2) 


n n? 





r=1— 7 — (28) 








or 








=1— . 29 
n N N 
But the expression in brackets in the denominator term of (29) is 
simply o’, the variance of the test. Furthermore, 





aa ees oe ee ed 








N 7 

S?? ad 

2 =o7? + M?. » 

N 4 

Hence we can write (29) Le 
n M — (M? + o?) ‘f 

r=1— (30) y 


(n—1)o? 
or 





1 (Coe eee b (31) 


n—1 o” 


(31) simplifies to 





i 


Te ea 


which is the same as formula (24), namely, the more convenient 
transformation of Kuder-Richardson formula (21). 











PSYCHOMETRIKA—VOL. 14, NO. 1 . 
MARCH, 1949 


A FACTORIAL APPROACH TO JOB FAMILIES 


CLYDE H. COOMBS AND GEORGE A. SATTER 
UNIVERSITY OF MICHIGAN* 


This is an experimental study of the application of factor analy- 
sis to a new domain—the formation of job families. Correlations be- 
tween jobs are computed from the formula based on the number of 
common elements between two variables and the job analyses provide 
the basic data on the presence or absence of the elements. A first- 
order general factor and four common factors are obtained in a small 
sample of twenty occupations. Tentative interpretations are made 
and implications for job analysis and the formation of job families 
are pointed out. 


1. The Nature of the Problem 

The determination of the dimensionality of the world of work 
and the composition of job families presents intriguing methodologi- 
cal problems. The organization of occupations into job families has 
generally been accomplished by the exercise of judgment, of expert 
and knowing judges perhaps, but still not objective and verifiable. 
This study is a report of an attempt to adapt job analyses to the tech- 
niques of multiple-factor analysis. 

A psychological analysis of the domain of occupations leads to 
the conclusion that there is not necessarily any single set of job 
families, but rather that there may be a different solution for each 
purpose which job families serve. 

For purposes of facilitating certain aspects of the work of Selec- 
tive Service during World War II, for example, three families of oc- 
cupations were created and used: critical, essential, and the remainder. 
The critical family was comprised of those occupations which had 
long training times, which were essential to a war activity, and in 
which there were national shortages. The essential family, similarly, 
was defined by occupations in an essential war activity, but in gen- 
eral these occupations required less training time than the members 
of the critical family. 

This illustrates the basic philosophy underlying the construction 
of job families. They are created to facilitate or simplify the accom- 

*The analysis the results of which are reported here was made possible by 


the Bureau of Psychological Services, Institute for Human Adjustment, Horace 
H. Rackham School of Graduate Studies, University of Michigan. 


33 


: tm to 


os —— ws wees mes ee et ee = = 


= 








34 PSYCHOMETRIKA 


plishment of particular objectives. Another objective during the re- 
cent war was to expand the employment of women. Consequently, 
another family was created—a family comprised of those jobs which 
possessed the characteristics of being suitable for women. 

There must be a very great variety of objectives which would 
be served by relevant job families. A few might be listed as follows: 


1) vocational guidance of individuals who are contemplating 
training for, or immediate entrance into, an occupational area, 

2) vocational guidance of various handicapped groups, 

3) the establishment of vocational training curricula, 

4) vertical transfer of personnel—promotion within the organi- 
zational unit, 

5) horizontal transfer of personnel, as in the case of the utiliza- 
tion of civilian skills in military occupations, 

6) the development of an interest inventory, 

7) the development of a differential aptitude test battery, 

8) a basis for the organization of unions, 

9) occupational representation in a legislature, as Toynbee* sug- 
gests, and 

10) the establishment of wages and salaries. 


To attain any one of these objectives, an appropriate system of 
job families would be desirable. To establish an appropriate set of 
families, however, requires first a job analysis designed to secure that 
kind of information about a job which is relevant to that objective. 
Having done that, an analysis of the sort carried out in this study 
appears to be an appropriate way to determine relevant and useful 
job families. 

There is nothing to be gained here by summarizing the various 
systems on the basis of which jobs have been classified into groups 
or families. Shartle’s} book and an article by Cardallt describes some 
of them. 

Contrary to the above requisites for the formation of meaning- 
ful and useful job families, the present study was based on job analy- 
ses of a standard, almost universal character. The job analyses were 
made for an entirely different purpose without any thought of their 
appropriateness for such a study as this. Consequently, this study is 
primarily of methodological and theoretical interest. This study was 
carried out as a pilot investigation to determine the feasibility of 
this method of analysis. 


*Toynbee, Arnold J. A study of history. New York: Oxford University 
Press, 1946, p. 617. 

+Shartle, Carroll L. Occupational information. New York: Prentice-Hall, 
Inc., 1946, p. 339. 

{Cardall, Alfred J. A test for primary business interests based on a func- 
tional occupational classification. Educ. psychol. Meas., 1942, 2, 113-1388. 











CLYDE H. COOMBS AND GEORGE A. SATTER 35 


2. The Job Analysis 


The data which form the basis of this investigation were collected 
in the course of evaluating 70 jobs in a large, mid-western paper mill. 
The methods of study were modeled after those developed and popu- 
larized by the United States Employment Service and involved sending 
trained job analysts to the departments of the plant employing per- 
sonnel in the job classifications up for evaluation. The analyst in- 
terviewed and observed. He interviewed the employee on the job, his 
immediate supervisor, and the departmental head under whose juris- 
diction the job fell; he observed the employee as he carried out the 
routines of his job. On the basis of these cbservations and the infor- 
mation collected in the interviews, he prepared a job description; he 
also assisted, with the immediate supervisor and the department head, 
in the preparation of a specification for the job. It is these specifica- 
tions which supply the raw materials for the current analysis. 

The specifications were recorded in the form of a “rating” on 
a standard specification sheet. The sheet made provision for record- 
ing 18 such judgments of various aspects of the skills and knowledges 
required by the jobs. Each of the 18 items was prefaced by a brief 
statement defining a particular skill or knowledge area, and this was 
followed by three or four alternative phrases or statements of vari- 
ous degrees of skill or knowledge. In this investigation these alterna- 
tives are regarded as elements—characteristics which make jobs alike 
or different, i.e., generate correlation between them. In all there were 
104 such elements distributed in the following manner: 








ae ee ee ae ee ee a es eee eee 


Possible no. 











Category of elements 
educational skills 18 
work skills 42 
application skills 9 
social and personal skills 19 
activity distribution 16 
Total 104 | 
i 


3. The Correlation Matrix 


In order to reduce the labor involved, the 70 occupations included 
in the job analysis were reduced to 54 by the arbitrary elimination of 
obvious doublets. For example, Senior Invoice Clerk was retained and 
Junior Invoice Clerk was dropped; Librarian was retained and Assist- 
ant Librarian was dropped. 

The correlations between the 54 occupations were computed on the 











86 PSYCHOMETRIKA 


basis of the number of common elements by the use of the formula:* 
Ne 


V (Ma + Ne) (My + Ne) 





Tab = 





where 
N- = number of elements common to jobs a and b. 
N, — number of elements in job a not in job b. 
m, = number of elements in job b not in joba. 


Inasmuch as this was to serve as a pilot study, the matrix of 
correlations of 54 jobs was of too large an order to factor. Conse- 
quently, a submatrix of order 20X20 was selected. The 20 variables 
whose intercorrelations were factored were selected on the basis of 
having the lowest sums of correlations with all the other variables. 
This was done in an effort to select 20 variables out of the original 
54 which would tend to span the same space as nearly as possible and 
thereby reveal some of the corners of the total configuration. 

Table 1 contains the basic data on which the correlations are 
based. The first column, labeled , indicates the total number of 
elements for each variable or job. The other cells of the matrix con- 


TABLE 1 








LS. 3 4 6 6 7 8 DF WH 21 428° 4 46 16 17 616.419 4 





CoNIAatrh WN 


28 31 30 30 39 82 30 45 42 22 32 27 26 28 27 41 

26 34 29 32 438 34 29 45 44 238 81 28 29 27 26 44 48 

22 26 28 24 29 24 23 30 29 22 26 25 19 19 20 32 27 29 
23 29 26 26 27 25 23 30 27 22 26 22 23 20 22 31 28 28 25 





*Peters, C. C. and Van Voorhis, W. R. Statistical procedures and their 
mathematical bases. New York: McGraw-Hill Book Company, Inc., 1940, p. 122. 











+ 
_ 


<—is ewe we oe ewe © TT tT te 


37 








CLYDE H. COOMBS AND GEORGE A. SATTER 


doys ‘41919 02 

*4daoey 2 “1edQ suoydeey, 6T 
Surze[ngey ‘tostazedng gt 
doysg julig ‘tostaredng LT 
ALe4I190G SAIPNIOX| OT 


WIBMMS « «ST 
ydeisynW‘’ « FT 
young Aoys .. ST 
‘YR JulLg npg “10edO ZT 
esinN TT 


IOALIG YONI, — AISuessayT OT 
ueBlTerqryT 6 
yuowfoldury ‘taMatAIazUl g 
UBIDIUYDaT, WOOY ye ZL 
TBPIO SI[BS “IS “ce “ « 9 


(83.49 “TISUT) ‘, «lk 
pilose oe “ “ y 

doys qt‘, « & 
Burysog ‘, « & 
*Peyrg Bp 19pIO ‘g “ON HIBIO T 





a[qeire A 








XIV UOTZepELIOD ey, 














38 PSYCHOMETRIKA 


tain the number of elements common to the pair of jobs designated 
by that row and that column. , 
The variables selected and their matrix of intercorrelations are 


given in Table 2. 


4. The Factor Analysis 

The correlation matrix was factored by the complete centroid 
method to six factors, at which point the residuals were all quite 
small. The centroid matrix was rotated to a satisfactory simple struc- 
ture with four planes which had appreciable variance, one residual 
factor, and one first-order general factor. Both the residual factor 
and the first-order general factor were set orthogonal to each other 
and to the other four planes. 


TABLE 3 
The Centroid Factor Loadings 








I II Ill IV V VI kh 


-780 -—278 —-137 187 -.059 049 748 
795 -170 -.181 140 222 =-.093 -763 
818 -.208 033 100 -.116 -.109 -737 
817 -.253 -.238 140 .070 129 810 
-796 256 -.0738 -.088 158 073 137 





or WN &! 


6 817 -.178 -.144 ~-.056 127 .065 -737 
7 .710 053 151 -.169 .088 121 558 
8 -783 A77 = -.291 080 -.142 -.028 756 
9 .799 264 -.080 -.085 -.142 -.062 -736 
0 699 -.028 343 139 -.088 -.049 634 


11 753 151 166 103 020 -.136 591 
12 734 115 225 .046 097 132 -614 
13 -794 -.260 -.040 -.091 019 -.103 -708 
14 -763  -.270 114 -.282 -.081 095 754 
15 .726 -.208 214 -—232 -.132 025 -685 


16 805 147 = -.239 093 -.180 145 -768 
17 -767 227 «6-046 -.173 -.103 -.083 .682 
18 -744 2386 -—140 -.195 071 -—.129 672 
19 TAT 191 192 185 091 .093 674 
20 772 .029 144 148 095 —.133 648 





The centroid factor matrix (F.) is given in Table 3, and the 
final oblique factor matrix (V) is given in Table 4. Table 5 contains 
the transformation matrix (A) leading from the centroid matrix (F,) 
to the final rotated matrix (V) by the equation 








CLYDE H. COOMBS AND GEORGE A. SATTER 








TABLE 4 
The Final Rotated Matrix of Factor Loadings 
A B C D E F 





1 -.019 .063 178 456 561 -.032 
2 -.084 050 -.006 380 731 —.162 
3 039 183 245 282 597 -.118 
4 -.031 -.049 129 517 .663 015 
5 218 006 -.025 062 -799 160 


6 008  -.086 191 310 740 =—.022 
7 024 .056 284 8 —.051 -683 153 
8 416 -.016 -.054 306 604 059 
9 407 081 .030 060 654 086 
0 -.055 .420 204 -.052 527 049 





1 

11 117 330 014 -.033 654 001 
12 -.029 254 104 ~-.022 675 218 
13 011 -.006 302 211 669 ~—181 
14 -.022 -.076 .540 064 -628 009 
15 -.001 051 508 -.036 574 —.002 
16 350 © -.018 025 338 596 216 
17 382 O11 115 -.025 -667 039 
18 3538 = —.063 011 .008 -735 —.038 
19 .013 344 —.032 .027 667 223 
20 005 323 025 058 681 -.053 

TABLE 5 


The Direction Cosines of the Final Reference Vectors 








A B Cc D E F 





I 14 12 18 18 85 04 
II 60 16 -.57 -.42 13 40 
III -.47 69 34 -.71 01 15 
IV -.24 64 —-.57 47 -.19 .09 
V -53 -03 -—39 -.02 AT —.13 
VI -.25 -.26 19 26 © —.02 89 





TABLE 6 





Intercorrelations of the Final Reference Vectors 





A B Cc D E F 





1.00 
-.28 99 
-—18 -.24 .99 
-06 -30 -.18 1.01 
.00 00 00 01 1.00 
00 00 .00 01 00 99 


3HvaAnP 








39 








40 PSYCHOMETRIKA 


V=F.~A. 


The intercorrelations of the final reference vectors are given in Table 
6. 

The direction cosines of the primary reference vectors, their in- 
tercorrelations, and the projections of the variables on the primaries 
were computed but do not add materially to the interpretations of 
the study and for this reason are not reproduced here. 


5. Identification of Factors 


Factor A: Self-responsible jobs 
The principal jobs and their projections on Factor A are shown 
in the table below: 











Loading 
No. Name onA 
8 empleyment interviewer 42 
9 librarian 41 
17 print shop supervisor 38 
16 executive secretary 35 
18 tabulating supervisor 235 





These jobs are all characterized by the individual’s independence 
and self-responsibility on the job, dealing with individuals outside of 
the company, a relatively high order of educational skills, and some 
administrative and supervisory competence. 


Factor B: Routine, entry occupations 


The major projections on Factor B were the following: 














Loading 
No. Name on B 
10 messenger—truck driver 42 
19. tele. oper. & recept. 34 
11 nurse 33 
20 shop clerk O2 





The basis for the common factor variance of this cluster is not 
clear. These jobs seem to have little in common except that they are 
comparatively routine and, in this organization, were classified as 


“entry” jobs. 














CLYDE H. COOMBS AND GEORGE A. SATTER Al 


Factor C: Skilled machine operation jobs 
The jobs and loadings characterizing this factor are as follows: 











Loading 
No. Name on C 
14 multigraph operator 54 
15 multilith operator ol 
18 key punch operator .30 





These jobs all require training on the job and involve a high 
level of skill in machine operation. This latter characteristic is in 
contrast to the activities involved in operating blueprinting machines. 


Factor D: Clerical jobs 


This factor is characterized by the following occupations: 











Loading 
No. Name on D 
4 record clerk 52 
1 ordering & sched. clerk 46 
2 posting clerk 38 
16 executive secretary 34 
6 sr. sales order clerk 31 
8 employment interviewer 31 





These occupations are all clerical in nature, characterized by a 
moderate to high degree of educational skills, calling for neat appear- 
ance, and involving such activities as those of filing, posting, and re- 
cording. 


Factor E 


This factor is a first-order general factor entering into every oc- 
cupation included in the factor analysis. There are several possible 
ways of accounting for this factor. The job analyses used as the 
basis for generating the correlations tended to be of a general na- 
ture rather than highly itemized and specific. This would increase 
the apparent similarity of the jobs and their intercorrelations and 
consequently would generate a general factor. On the other hand, 
a general factor might appear even with a highly itemized job 
analysis. The implication would be that these occupations are a lot 
more alike than is sometimes thought, particularly by vocational guid- 
ance technicians. 











42 PSYCHOMETRIKA 


Factor F 


This is the residual factor and is of no psychological interest or 
significance. 


6. Discussion 

This study has certain definite limitations which should be point- 
ed out. 

No first factor analysis of a very large domain should be regarded 
as definitive when it is based on a sample of 20. The structure re- 
vealed may be regarded as a significant structure for these 20 vari- 
ables and perhaps for the 70 from which they were selected. But in 
view of the thousands of different occupations, no small sample suit- 
able for factor analysis by present methods is apt to reveal a struc- 
ture suitable for the universe of jobs. ade tt 

Second, the job analyses themselves, as previously pointed out, 
should be detailed and specific and designed for the particular pur- 
poses the job family structure is to serve. The more superficial, ver- 
bal, and judgmental the job analyses, the more they will tend to be 
correlated and this can give rise to a first-order general factor that 
is, at least in part, an artifact. 

Third, the suitability of the correlation formula used is seriously 
open to question. The assumptions involved in the use of the formula 
are numerous and include the following: that the contributions of the 
elements are additive, uncorrelated, equally potent, and equally likely 
to be present or absent, and further that the total variances of the 
variables are equal. Without doubt, these assumptions are violated 
and to an unknown degree. For example, the elements are undoubted- 
ly correlated, some positively and some negatively. 

In general, effects of this sort might be expected to enhance or 
depress a general factor, correlations between reference vectors, and 
the error variance, but to have less effect on the ultimate interpreta- 
tions of the factors obtained by means of simple structure. Other 
methods of correlation not involving these assumptions could be tried 
and the logic of factor analysis adapted to the type of correlation. 

Perhaps the best approach of all would be to begin with the ob- 
verse problem of factoring job elements and then to proceed with new 
job analyses based on a set of independent elements. 














PSYCHOMETRIKA—VOL. 14, NO. 1 
MARCH, 1949 


NOTE ABOUT THE MULTIPLE GROUP METHOD 


L. L. THURSTONE 
THE UNIVERSITY OF CHICAGO 


This note directs attention to the basic similarity between a 
factor analysis method described by Holzinger in 1944 and what 
Thurstone has called the multiple group method. With minor modi- 
fications and the application of Holzinger’s method in several succes- 
sive cycles until the residuals vanish, the methods are essentially the 
same. 


The purpose of this note is to call attention to the fact that a 
paper by Holzinger on a new factoring method anticipated essential 
parts of the multiple group method of factoring which was described 
by the writer in this journal six months later (1, 2). Holzinger de- 
scribed his method as a factoring method that is applicable to a spe- 
cial kind of correlation matrix whereas, in fact, his method is en- 
tirely general with only slight modifications or extension. The state- 
ments that Holzinger made about his method and those which the 
writer has made about the multiple group method are entirely at vari- 
ance so that it is natural for the reader to infer that the two methods 
are entirely different. Closer inspection shows that Holzinger’s meth- 
od is not restricted to the special case which he describes. Holzinger 
had a better idea than he thought and his method can be readily ex- 
tended to cover any type of correlation matrix. It then becomes what 
the writer has called the multiple group method of factoring. 

When my attention was called to the similarity of the two papers, 
it was difficult to see how they could describe essentially the same 
method because of our respective statements. Holzinger said: “The 
simple method here presented is applicable to the factoring of a corre- 
lation matrix in case the latter can be sectioned into portions of ap- 
proximately rank unity’; (1, 257) and also: “There is no guarantee, 
of course, that any correlation matrix can be factored in the above 
manner, ... .” (1, 261). In my description of the multiple group 
method in the Psychometrika paper (2) and in the text on multiple- 
factor analysis (3), I said: “The multiple group method of factoring 
is general in that it can be used on a correlation matrix of any rank, 
any order, and any configuration of test vectors, . .. .” (3, 170). 
Holzinger limited himself to the case in which the variables could be 


43 











44 PSYCHOMETRIKA 


so arranged that each section of the correlation matrix was of unit 
rank. This restriction does not apply to the multiple group method. 
I said: “The method of selecting the groups of tests is not crucial 
for the multiple group method of factoring, since the only require- 
ment is that the centroid vectors for the several groups shall be lin- 
early independent”; (3, 171) and, further: “The computer will se- 
lect, for each group, those tests which are nearly collinear, if such 
groups can be found in the correlation matrix. If such groups are not 
readily seen by inspection, then the grouping can be carried out by 
some other routine”; (3, 171) and “Further, it is not necessary that 
all of the tests in each group be collinear, or nearly so. All that is re- 
quired is that the whole correlation matrix be divided into sections 
that are linearly independent.” (2, 78). Holzinger said: “The ade- 
quacy of the solution may be tested in each section, . . . ,” (1, 257) 
and he describes the procedure for “testing the solution by checking 
the rank of submatrices.” (1, 261). The check consists in verifying 
that the correlations are proportional in each section. No such re- 
striction applies to the multiple group method. On this subject I said: 
“The grouping was chosen quite arbitrarily in this example in order 
to illustrate in a numerical example that the multiple group method 
of factoring is independent of the method of grouping the variables. 
The only requirement is that the centroid vectors of the several groups 
must be linearly independent.” (3, 175). On the treatment of resi- 
duals, I said: “If the first estimate of the number of clusters is too 
small, one merely repeats the process until the residuals vanish.” (3, 
175). : 

With only a few minor modifications, Holzinger’s method be- 
comes what I have called the multiple group method of factoring. 
These minor modifications are as follows: 

1) Eliminate the restriction that the correlation matrix shall be 
divisible into sections of unit rank. 

2) Eliminate the reservation that the method may not be ap- 
plicable to any given correlation matrix. The method can easily be 
made general so that it is applicable to a correlation matrix of any 
order, any rank, and any configuration of tests. 

8) Eliminate the check by which each section of the matrix is 
tested for unit rank because this check is not applicable to the mul- 
tiple group method of factoring. 

4) Holzinger’s description implies that the number of groups 
is equal to the rank of the correlation matrix. This restriction is not 
necessary for the multiple group method, which implies only that the 
number of groups is equal to or less than the rank of the reduced 
correlation matrix. If the number of groups should exceed the rank 














L. L. THURSTONE 45 


of the reduced correlation matrix, then this fact will be discovered in 
factoring the correlations between the centroid axes by the diagonal 
method. This is one step in the multiple group method. 

5) Instead of avoiding the computation of an orthogonal factor 
matrix F’,, this matrix is the objective to be attained in the multiple 
group method (1, 261). 

The actual computations can then proceed just as Holzinger has 
described. The residual correlations are then computed for the num- 
ber of factors, which is equal to the number of groups that was used. 

6) If the residuals do not vanish, then one or more groups of 
tests are selected from the residual matrix, and the procedure is re- 
peated. The adequacy of the method is not determined by the resi- 
duals. If they do not vanish, the procedure is repeated until they do 
vanish. The additional columns so found are simply added to the col- 
umns of the factor matrix F that were obtained in previous cycies. 

If these minor modifications are made in Holzinger’s paper, and 
if the method which he describes is applied in several successive cycles 
until the residuals vanish, then his method becomes what the writer 
has called the multiple group method of factoring. Our failure to 
recognize the relation between the two papers was probably caused by 
the exposition in the earlier paper, which was limited explicitly to a 
special kind of correlation matrix, whereas the underlying idea could 
easily be extended to cover any correlation matrix. This circumstance 
leads to the inference that Holzinger evidently had a better idea than 
he realized when he limited his description to sections of unit rank. 


REFERENCES 
1. Holzinger, Karl J. A simple method of factor analysis. Psychometrika, 1944, 
9, 257-262. 
2. Thurstone, L. L. A multiple group method of factoring the correlation ma- 
trix. Psychometrika, 1945, 10, 78-78. 
8. Thurstone, L. L. Multiple-factor analysis. Chicago: University of Chicago 
Press, 1947. 











PSYCHOMETRIKA—VOL. 14, NO. 1 
MARCH, 1949 


A STATISTICAL CRITIQUE OF THE USAFI TESTS OF 
GENERAL EDUCATIONAL DEVELOPMENT* 


WARREN G. FINDLEY AND NEAL B. ANDREGG 
THE AIR UNIVERSITY 


Data resulting from the administration of the USAFI Tests of 
General Educational Development to more than 1000 junior Air 
Force officers have been statistically analyzed to indicate the reli- 
ability of these tests, their correlation with school achievement, the 
comparability of their intercorrelations with intercorrelations among 
grades in school subjects, their capacity for differential diagnosis, 
their factorial composition, and their average item-test correlations. 
In the light of these findings and the finding that there is a low gra- 
dient between achievement on these tests and amount of formal edu- 
cation among these officers, the tests have been evaluated as possess- 
ing the practical validity suggested by their face validity for selec- 
tion of young Air Force officers for assignment to study at civilian 
colleges and universities. 


The USAFI Tests of General Educational Development had 
passed the two-million mark by June 1947. By now, the numbers of 
veterans and others who have taken these general achievement tests 
must be much greater. 

In our opinion these tests merit the wide use they have had with 
veterans. They have been specially designed to measure those residues 
of formal or informal learning that an adult mind retains for ready 
use. Hence, on their face these tests may be accepted not only as fair 
to veterans as measures of their varied learnings, but also as potential 
indicators of generally important mental abilities for all adults. 

Because of such apparent merits, these tests were chosen for ex- 
perimental study in the United States Air Force to determine their 
value in selecting officers for assignment to study at civilian colleges 
and universities. The statistical data reported in this paper were ac- 
cumulated in the process of studying the merits of the tests for evalua- 
ting the mental competence of Air Force officers who are in much the 
same position as veterans, several years removed from formal study 
but anxious to resume. The results are therefore relevant not only to 

*Paper read before the National Council on Measurements Used in Educa- 
tion, Atlantic City, New Jersey, February 24, 1948. The writers are indebted to 


Dr. James E. Greene, formerly Deputy for Research of the Educational Advisory 
Staff, The Air University, for aid in planning and sponsoring this study. 


47 








ee ee 








48 PSYCHOMETRIKA 


the U.S. Air Force, but also to the situation in colleges where veterans 
are studying. 

The data are on students in the Air Tactical School, the basic 
school for Air Force officers already commissioned. All junior officers 
in the Air Force attend this four-month school, and it is chiefly from 
these student officers and other officers of similar age and experience 
that come the applications for assignment to study at civilian colleges 
and universities. 

All officers in Class 47-B and Class 47-C at the Air Tactical School 
were given the four college-level tests of the USAFI Tests of General 
Educational Development and Test 5 (General Mathematical Ability— 
High School Level). Since there is no college-level test of general 
mathematical ability in the battery, the high-school-level test was 
used. A measurement of the general mathematical ability of officers 
was essential because a majority of those assigned to study at civilian 
institutions pursue technical studies. 


Level of Achievement on GED Tests 
Table 1 indicates briefly the general caliber of approximately 500 
Air Tactical School students in each of two successive classes on each 
of the four college-level tests. 


TABLE 1 


Medians and Quartiles of Air Tactical School Students, Classes 47-B and 47-C, 
USAFI Tests of General Educational Development (College Level) 





Test 1 Test 4 Test 2 Test 3 
English Literary Social Natural 
Expression Materials Studies Sciences 
, 47-B 47-C 47-B 47-C 47-B 47-C 47-B 47-C 
Upper Quartile .......0.....0..0...... 63 67 67 69 74 75 72 74 
Median 59 63 62 65 68 70 68 70 
Lower Quartile ........................ 52 60 57 60 63 65 64 66 
Type I College Standard........ 55 57 60 61 
Minimum College Standard.. 50 53 55 57 


On all four tests 75% or more of Class 47-C made scores that were 
higher than those required for accreditation by Type I colleges in lieu 
of six semester hours study in each area tested. This was also true 
of Class 47-B if the test on English Expression is excluded. 

The consistently superior performance of Class 47-C is perhaps 
explained by the fact that this class had greater motivation than Class 
47-B. The administration of the USAFI Tests of General Educational 
Development was required for the first time in Class 47-B. Students 











WARREN G. FINDLEY AND NEAL B. ANDREGG 


TABLE 2 


49 


Standard Score Equivalents of Raw Scores Made By Students in Class 47-B, 


Air Tactical School, On USAFI Tests of General Educational 


Development, Grouped by Amount of Education* 


Correctness and Effectiveness of Expression (Test 1—College Level) 
C Grad* 


HS Grad* 
Mean 56 
Q; 63 
Median 57 
Q, 51 
N 168 


Interpretation of Literary Materials (Test 4—College Level) 


Mean 62 
Q, 67 
Median 62 
Q, 58 
N 161 


oe 


C1* 


57 
68 
56 
52 
80 


61 
67 
61 
56 
77 


C2* 


C3* 


59 
63 
60 
53 
54 


61 
67 
62 
57 
50 


64 
69 
64 
57 
77 


Total* 


Interpretation of Reading Materials in the Social Studies (Test 2—College Level) 


70 
78 
70 
64 
81 


Interpretation of Reading Materials in the Natural Sciences (Test 3—College 


General Mathematical Ability (Test 5—High-School Level) 


Mean 67 67 
Qs 72 71 
Median 67 67 
Q, 63 63 
N 158 719 
Level) 
Mean 67 68 
Q, 70 72 
Median 66 68 
Qa, 63 64 
N 153 15 
Mean 62 63 
Qs, 68 69 
Median 63 65 
Q, 58 61 
N 173 87 


*HS Grad—High-school graduate 


Cl--1 year college 
C2—2 years college 
C3—8 years college 


C Grad—College graduate 


70 
75 
71 
65 
73 


Total—Includes a few students who were non-high-school graduates and a few students with 1 or 
or more years of graduate study. 








50 PSYCHOMETRIKA 


TABLE 3 


Standard Score Equivalents of Raw Scores Made by Students in Class 47-C, 
Air Tactical School, on USAFI Tests of General Educational 
Development, Grouped by Amount of Education* 


Correctness and Effectiveness of Expression {Test 1—College Level) 





HS Grad* Ci* ta C3* C Grad* Total* 
Mean 63 3 63 65 65 64 
Q, 66 68 67 69 69 67 
Median 63 64 63 64 65 63 
i Q, 58 60 59 61 62 60 
Ve N 175 85 90 75 72 533 
iz Interpretation of Literary Materials (Test 4—College Level) 
hi Mean 64 65 64 66 66 65 
i Q; 68 69 68 69 69 69 
; Median 64 66 64 66 66 65 
j Q, 60 60 60 61 61 60 
$ N 175 85 90 75 72 5383 
t Interpretation of Reading Materials in the Social Studies (Test 2—College Level) 
: Mean 68 fi 69 71 74 70 
: Q, ' 74 76 74 15 78 15 
Pg Median 68 70 69 a1 73 70 
§ Q, 64 66 64 66 68 65 
t N 175 85 90 75 72 533 
" Interpretation of Reading Materials in the Natural Sciences (Test 3—College 
: Level) 
( Mean 68 70 70 70 71 70 
t Q, 72 74 74 74 76 74 
$ Median 69 71 70 69 72 70 
‘ Q, 65 66 66 66 66 66 
. N 175 85 90 75 72 533 
3 
General Mathematical Ability (Test 5—High-Schcol Level) 
Mean 65 67 66 66 68 66 
Q, 69 15 72 15 15 72 
Median 66 67 66 66 68 66 
Q, 61 62 61 60 63 61 
N 175 85 90 75 72 533 


*HS Grad—High-school graduate 

Cl—1 year college 

C2—2 years college 

C8—8 years college 

C Grad—College graduate 

Total—Includes 36 students with 1 or more years of graduate study. 














WARREN G. FINDLEY AND NEAL B. ANDREGG 51 


in that class felt that the test scores might be used by Air University 
authorities who make the selection of Air Force students for attend- 
ance at civilian universities, but they were not sure of it. By the time 
the tests were administered to the next class, 47-C, the fact that they 
were a part of the selection process was well known. Students also 
knew that their scores on these tests would be used by the Comman- 
dant of the Air Tactical School in making recommendations for fur- 
ther military schooling. 


GED Test Scores and Amount of Previous Formal Education 

Tables 2 and 3 show the standard score equivalents of raw scores 
made by students of both classes, grouped by amount of education. 
There is a large amount of overlap in scores made by high-school 
graduates and those made by students with varying amounts of col- 
lege education, including college graduates. Twenty-five per cent of 
the high-school graduates equaled or exceeded the median score for 
college graduates in all tests except the Interpretation of Reading Ma- 
terials in the Natural Sciences. Similarly, over 25% of the college 
graduates were below the median of high-school graduates. In Class 
47-C, 94 to 98 per cent of all high-school graduates met minimum col- 
lege standards on each of the four tests. 85 to 91 per cent of this same 
group met Type I college standards. 

In all cases the trend is definite, that is, scores tend to increase 
with number of years of college, but the gradient is low. This low 
gradient may be explained partly by the fact that the tests were de- 
signed to measure the functional outcomes of a program of liberal 
education in several areas and to minimize the more immediate and 
temporary content objectives of special school subjects. The generally 
high level of scores made by Air Tactical School students and the low 
gradient of scores with added education are probably largely the re- 
sult of the rigorous screening process used by the Army Air Force 
during World War II. The war interrupted the formal education of 
a large number of these men who would normally have gone to college. 
Many of them have made considerable educational progress through 
self-directed reading and study, extension course programs, observa- 
tions, and direct experience. Certainly no generalizations about rep- 
resentative groups with varying amounts of education can be drawn 
from these data. For the Air Force’s purpose of selection, the low 
gradient with respect to amount of formal education was a strong a 
priori consideration in their favor. 











52 PSYCHOMETRIKA 


Predictive Validity of GED Tests 

Table 4 indicates the relationship between test scores and end- 
of-course grades at Air Tactical School. The criterion was a weighted 
average of students’ grades on twenty-odd measures, chiefly objective 
tests but including ratings on making a speech and leading a discus- 
sion. The average reliability of these individual measures is .75. In 
each class, the Social Studies test had the highest correlation with 
grades. First partial correlation coefficients were low, hence the mul- 
tiple correlations were only slightly higher than the correlation be- 
tween Social Studies test scores and grades. In general the correla- 
tions of these test scores with grades are of the same order of mag- 
nitude as similar correlations between entrance examinations and col- 
lege grades.* 

TABLE 4 


Correlation of USAFI Tests of General Educational Development with Grades 
at Air Tactical School, in Classes 47-B and 47-C 


Correlation Correlation with Grades with 
Test with Grades Social Studies Held Constant 
47-B  47-C 47-B 47-C 
1—English Expression (College) 36 49 14 23 
2—Social Studies (College) 51 55 me sis 
8—Natural Sciences (College) 41 48 12 17 
4—Literary Materials (College) 45 49 19 19 
5—General Mathematics (H. S.) 09 42 19 .20 


N 889 533 
Multiple Correlation 
Based on Tests 2 and 5 54 57 


Based on Tests 1 and 2 58 


Reliability of GED Tests 

Table 5 reports the reliability coefficients of the five tests in the 
two classes at Air Tactical School. The coefficients are based on use 
of the split-halves method and Spearman-Brown formula. This meth- 
od of estimating reliability is particularly appropriate here because 
the tests are work-limit tests and half-scores are consequently not 
affected by failure of students to finish. 

In setting up the split-halves in each test, care was taken to pro- 
duce half-tests. Only in the case of Test 5, General Mathematics, and 
the spelling section of Test 1, English Expression, were odd items 
matched against even items. In the other part of Test 1, the items of 
two passages were matched against the items of two other passages. 
In Tests 2, 3, and 4, all items on half the passages were made into one 

*In a subsequent class, Class 48-B, the current Army General Classification 


Test was given to all 737 students. The correlation with grades was .42. In this 
latter class the grades were based on highly similar and no Jess reliable data. 











WARREN G. FINDLEY AND NEAL B. ANDREGG 53 


half-test and the items on the other half of the passages were made 
the other half-test. Passages for the two halves of a given test were 
matched as far as possible for subject matter, special skills involved 
(such as graph-reading), number of items, length of reading, and 
position in test, in that order of consideration. 

The uncorrected coefficients reported in the first two columns 
would have to be considered low were it not for the fact that the popu- 
lations on which they are based are highly selected. Fairer judgments 

-may be made on the corrected coefficients at the right. The corrected 
range is that found in the general college populations used in deriving 
national norms on the tests. Since these norms are given in percen- 
tiles, we have used the ratio of interquartile ranges rather than the 
ratio of standard deviations in Kelley’s formula. 


TABLE 5 
Reliability of USAFI Tests of General Educational Development 
at Air Tactical School, Classes 47-B and 47-C 


Test Observed Corrected for Range* 

47-B 47-C 47-B 47-C 
1—English Expression 82 83 85 94 
2—Social Studies .82 84 87 91 
3—Natural Sciences 89 89 93 98 
4—Literary Materials .80 78 .86 88 
5—General Mathematics 83 81 93 88 

N 457-512 533 


*Kelley, T. L. Statistical method. New York: Macmillan, 1924, p. 222, formula 178. 


The corrected coefficients indicate good reliability for all tests 
except Test 4, where even the corrected coefficients are both below .90. 
Test 3 shows consistently highest reliability. Test 5 suffers in the 
comparisons because many of the examinees hit the ceiling, particu- 
larly in Class 47-C. This prevents any judgment on its reliability at 
the end of high school and simply means that the test cannot differ- 
entiate as reliably in a selected college population as is desirable. Simi- 
lar results would be found if Test 5 were given to engineering college 
students. 


Comparative Analysis of Intercorrelations 
Table 6 permits a number of comparisons and observations. The 
table was arranged to bring out the hierarchy of correlations between 
subject areas. Notice that the correlations decline as you read down- 
ward within the set of correlations with English Expression, and that 
the correlations are highest between consecutive subjects in. the se- 











54 PSYCHOMETRIKA 


TABLE 6 
Intercorrelations Among USAFI Tests of General Educational Development at 
Air Tactical School and at Yale Compared with Intercorrelations 
Among Grades at West Point 


GED Tests at Grades at GED Tests 


Correlations Between Air Tactical School West Point* at Yale} 
47-B 47-C 

English Expression and Literature 53 69 7 
English Expression and Social Studies .50 .62 AT 
Eng. Expression & Natural Sciences 45 58 48 
English Expression and Mathematics __.41 50 
Literature and Social Studies .65 .69 .65 A 
Literature and Natural Sciences 59f .64f 49t 52 
Literature and Mathematics .36 .46 42 
Social Studies and Natural Sciences .65 .68 .60 57 
Social Studies and Mathematics 49 51 .46 
Natural Sciences and Mathematics G1t = .64f -73t 
Number of cases 389 533 497 185 


*From Brigham, C. C. A study of error. New York: College Entrance Examination Board, 
1932, p. 38: 

+Crawford, Albert B. and Burnham, Paul S. Trial at Yale University of the Armed Forces In- 
stitute General Educational Development Tests. Educ. psychol. Meas., 1944, 4, 261-270. 

tThe science grades from West Point include only Physics and Chemistry, and the grades are 
based almost exclusively on problem-solving ; USAFI Test 3, Interpretation of Reading Materials in 
the Natural Sciences, covers biological science as well as physical science and requires quantitative 
reasoning rather than problem computations. These facts account for the relatively higher correla- 
tion between science and mathematics in West Point grades and the relatively higher correlation 
between science and literary materials in the USAFT tests. 


quence: English Expression, Literature, Social Studies, Natural Sci- 
ences, Mathematics. 

Comparison between the correlations in the two classes at Air 
Tactical School shows the effect of stronger motivation in the second 
class, resulting in higher correlations.* The relatively greater increase 


*Here and elsewhere in the article it is argued that uniformly high motiva- 
tion on these work-limit tests tends to produce higher intercorrelations than would 
prevail under conditions involving less uniformly high motivation. Critical readers 
of the manuscript have pointed out that higher motivation might well have re- 
sulted in narrower ranges of scores, with resultant lower intercorrelations. The 
writers are independently certain that factors were operating in Class 47-C that 
were calculated to produce more uniformly high motivation in taking the tests 
than prevailed in Class 47-B. The higher medians and pete narrower inter- 
quartile ranges shown for Class 47-C in Table 1 confirm this. Yet the reliability 
coefficients in Table 5 are approximately the same in both classes and the inter- 
correlations in Table 6 show an increase for Class 47-C. It is felt that it is a fair 











WARREN G. FINDLEY AND NEAL B. ANDREGG 55 


in correlations involving the test of English Expression is attributable 
to the fact that in Class 47-B this was given for a pretest of English 
Expression some time before the other tests, while in Class 47-C it 
was given as part of the battery. 

Comparison of each of the first two columns with the intercorre- 
lations of grades at West Point shows that these intercorrelations 
among grades are of the same magnitude as the intercorrelations 
among corresponding test scores. This finding bears on the criticism 
often made of the Tests of General Educational Development that they 
depend largely on reading or general ability. It is probable that West 
Point grades are more reliable than were the test scores at the Air 
Tactical School. Even allowing for somewhat greater attenuation in 
the test scores, however, the common factor in these tests at Air Tac- 
tical School is hardly greater than the common factor in achievement 
as measured in a related undergraduate institution. 

By way of comment it should be noted that the intercorrelations 
among the tests were probably heightened by the fact that they are 
work-limit tests. Any factor of this sort in the tests, however, is to 
be welcomed, for the typical college study situation is a work-limit 
situation. 

Finally, comparison of the intercorrelations at Air Tactical School 
with those given in the last column, from Crawford and Burnham’s 
study at Yale, shows again that all are of the same general magnitude. 
The agreement is close enough to indicate that our basic data are not 
influenced by peculiar factors different from those operating in a civil- 
ian college situation. The Yale data agree best with the data in Class 
47-B where motivation was not so uniformly high. It is reasonable 
to suppose that if the tests had been administered to the Yale students 
as part of their admissions program the intercorrelations would have 
approached those found in Class 47-C under high uniform motivation. 


Differential Guidance Value of GED Tests 
The data of Table 7 bear on the question as to whether the five 
tests may be viewed as efficient measures for differential guidance 
among the subject fields represented. The criterion applied is the extent 
to which differences between scores on pairs of tests tend to exceed 
differences that would be found between scores on equally reliable 





hypothesis that under the ordinary motivation in Class 47-B, some examinees did 
well where they readily excelled but poorly where they were not so gifted or in- 
terested; in Class 47-C, a comparable group of examinees under uniformly high 
motivation brought themselves through extended effort to relatively high levels 
of achievement in subjects in which they were not inherently interested or gifted, 
while they were unable to do significantly better in subjects in which they achieved 


readily. 











56 PSYCHOMETRIKA 


TABLE 7 
Differential Prediction of USAFI Tests of General Educational Development at 
Air Tactical School, Classes 47-B and 47-C 


Correlation Excess Differences* 


Pairs of Tests 47-B  47-C 47-B 47-C 
English Expression and Literary Materials 53 69 21% 12% 
English Expression and Social Studies 50 62 25% 20% 
English Expression and Natural Sciences 45 58 29% 26% 
English Expression and Mathematics Al 50 28% 24% 
Literary Materials and Social Studies 65 .69 14% 12% 
Literary Materials and Natural Sciences .59 .64 23% 19% 
Literary Materials and Mathematics 36 46 29% 23% 
Social Studies and Natural Sciences 65 .68 21% 20% 
Social Studies and Mathematics 49 51 25% 24% 
Natural Sciences and Mathematics 61 64 24% 21% 


*Kelley, T. L. A new method for determining the significance of differences in intelligence and 
achievement test scores. J. educ. Psychol., 1923, 14, 321-333. 

Segel, David. Differential diagnosis. Baltimore: Warwick and York, 1934. 

Bennett, George K. and Doppelt, Jerome E. The evaluation of pairs of tests for guidance use. 
Educ. psychol. Meas., 1948, 8, 319-325. 


tests of a common function. Kelley, Gates, Segel, and Bennett all ac- 
cept as a minimum standard for differential guidance that the pro- 
portion of differences in excess of the chance proportion shall be 25%. 
Applying this standard to the intercorrelations in Class 47-B shows 
that most of the paired tests approximately meet this standard. In 
Class 47-C, where motivation was uniformly high, only one pair (Eng- 
lish Expression and Natural Sciences) meet the standard of 25% and 
several pairs fall far short. Generally speaking, then, the tests do 
not differentiate efficiently between abilities in the separate fields 
although extreme differences in ability would be discernible. 

The work-limit character of the tests, which is an advantage for 
accreditation, placement, and general predictive purposes, serves to 
render them less effective for differential guidance purposes. The man- 
ner in which this feature operates may be illustrated from the experi- 
ence of the senior author in taking the civilian forms of the test. In the 
fields of social studies and mathematics, his fields of major activity 
and competence, he made good scores in short order. In natural science, 
however, he would have made only a moderate score if he had taken the 
test under a restrictive time-limit. Taking full advantage of unlimited 
time, however, he raised his score on the natural science test to rough- 
ly the same order as his social studies and mathematics scores. On the 
literary materials test, he made only a moderate score under short 











WARREN G. FINDLEY AND NEAL B. ANDREGG 57 


time limits and could not raise his score by taking additional time. 
This corresponds well with his undergraduate and subsequent experi- 
ence, namely, ability and interest in social studies and mathematics 
that led to good achievement readily, ability to achieve well in sci- 
ence if he would give extra time and effort, and inability to raise his 
achievement in English literature to distinction even with extended 
effort. Thus, scores on these work-limit tests would have served to 
guide him differentially away from English toward social studies and 
mathematics, but would have failed to guide him away from science 
in the way a time-limit test would have done. 


TABLE 8 
Factor Analysis of Correlations Among USAFI Tests of General Educational 
Development, Air Tactical School, Classes 47-B and 47-C* 


Factor 1 Factor 2 

Test (General) (Quantitative) Communality Reliability 
47-B 47-C 47-B 47-C 47-B 47-C 47-B 47-C 

1—English Expression 64 14 42 82 
.78 10 62 83 

4—Literary Materials 85 .00 12 80 
88 .00 .78 .78 

2—Social Studies -76 25 64 .82 
-79 19 67 84 

38—Natural Sciences 65 52 70 89 
71 49 73 89 

5—General Mathematics 45 62 58 83 
53 .55 .58 81 
47-B 47-C 

Maximum Residual after Removing 2 Factors 04 =.08 


Maximum Difference between Guessed and Computed Communalities .016 .013 


*Guilford, J. P. Psychometric methods. New York: McGraw-Hill, 1936, pp. 478-496. 


Factor Analysis of GED Tests 


Table 8 shows the resuits of factor analysis of this five-test bat- 
tery. It indicates by the small residuals that the common elements in 
the battery can be accounted for by two factors. The second factor 
is evidently quantitative, being present significantly only in Test 3, 
Natural Sciences, and Test 5, General Mathematics, and most heavily 
in the latter. The first factor is doubtless a composite of linguistic 
ability, general reasoning ability, and general motivation. To have 
separated out these factors would have required inclusion of tests of 











58 PSYCHOMETRIKA 


each of these factors independent of the others. The fact that general 
motivation is present in Factor 1 is to be judged from the relative in- 
crease in the weights of this factor in all tests between Class 47-B 
and Class 47-C. 

It is interesting to note in passing that arrangement of the tests 
in Table 8 in the subject sequence mentioned in discussion of Table 6 
results in generally declining weights in Factor 1 and generally ris- 
ing weights in Factor 2. This statistical evidence lends further sup- 
port to the thesis of a subject continuum from English to mathe- 
matics. A further special feature of this continuum is shown in the 
trend of validity coefficients in Table 9. 


TABLE 9 
Validity and Difficulty of Test Items, USAFI Tests of 
General Educational Development 


% of Items 
Average Meetinga Average 
Validity Standard Difficulty 





Test N Coefficient of .25 (% Correct) 

1—English Expression 

(High-Schoo] Level) 400* .29 70 77 
1—English Expression 

(College Level) 160+ .29 59 66 
4—Literary Materials 

(College Level) 400t 30 66 61 
2—Social Studies 

(College Level) 400¢ 88 79 65 
8—Natural Sciences 

(College Level) 400t 40 88 65 
5—General Mathematical 

Ability 

(High-School Level) 400f 42 88 717 


*Student officers of Class 47-A, Air Tactical School. 
t+tMembers of Academic and Administrative Staff, Headquarters, The Air University. 
tStudent officers of Class 47-B, Air Tactical School. 


Comparative Analysis of Items in GED Tests 
The formula for the validity coefficient used in obtaining the data 
reported in Table 9 is 
5(A—E) +2(B—D 
V.C.= ( ) | ) , where 
V8RW 














WARREN G. FINDLEY AND NEAL B. ANDREGG 59 


A represents the number of students in the top 20% of the class 
(total score) who answered the item correctly, 

B represents the number of students in the next highest 20% of 

the class (total score) who answered the item correctly, 

D represents the number of students in the next-to-lowest 20% 

of the class (total score) who answered the item correctly, 

E represents the number of students in the lowest 20% of the 

class (total score) who answered the item correctly, 

Ris the number of correct responses, and 

W is the number of incorrect responses and omissions. 

The formula above was adapted from the formula* 
2(A—E) + (B—D) 

V2RW 
In an experimental study of the adapted formula given first, based 
upon 40 multiple-choice and true-false items given to 80 students, it 
was found that the average difference between validity coefficients 
obtained from that formula and bi-serial 7’s was .01, the standard 
deviation of the differences keing .10. 

In Table 9, the average validity coefficients increase as one passes 
down the curriculum continuum from English to mathematics. The 
only exception to the trend is to be found in the spelling items in 
Test 1(English Expression) at both high-school and college levels. If 
the spelling items were omitted from the average validity coefficients 
for Test 1, these averages would be even lower. Of course, spelling is 
objectively correct or incorrect, so that the trend in average validity 
coefficients in Table 9 may be said to reflect the effect of increasing 
objectivity of material as one passes from English to mathematics. 





V.C.= 


Summary and Conclusions 

1. Over 1000 junior officers in the U. S. Air Force were given 
the four college-level tests of the USAFI Tests of General Educational 
Development and Test 5 (General Mathematical Ability—High-School 
Level). Separate analyses were made of the results in each of the two 
classes of approximately 500 officers at the Air Tactical School of The 
Air University. 

2. In Class 47-C, 75% or more of the students were above the 
Type I College Standard on each of the four college-level tests. In 
Class 47-B, where the motivation was less uniform, this was also true 
of all tests except Test 1, English Expression. 


*Adkins, Dorothy C. and Toops, Herbert A. Simplified formulas for item 
selection and construction. Psychometrika, 1987, 2, 165-171. 











60 PSYCHOMETRIKA 


8. When the results were analyzed with respect to the amounts 
of previous formal education each officer had had, the gradient in 
achievement on each test was only slightly positive. This was ex- 
plained by the very high achievement of those with only high-school 
education, which was in turn explained as due to the high selection 
in the U. S. Air Force during World War II and to the fact that many 
of these students did not attend college only because they went directly 
from high school into the Air Force during the war. 

4. The correlation of test scores with grades in the Air Tactical 
School ranged from .36 to .55, with Test 2, Social Studies, showing 
the highest correlation in both classes. Use of a second test jointly 
with Test 2 produced a multiple correlation of only .58. 

5. Uncorrected reliability coefficients varied from .78 to .89. Co- 
efficients corrected for the range of the standardization population 
varied from .85 to .94. Test 3, Natural Sciences, was generally most 
reliable; Test 4, Literary Materials, was generally least reliable. 

6. Intercorrelations among the tests proved similar in magni- 
tude to intercorrelations among these tests at Yale and to intercorre- 
lations of grades at West Point. 

7. The tests have little differential guidance value. The propor- 
tion of differences in excess of chance is greater than 25% in the case 
of only one pair of tests (English Expression and Natural Sciences) 
in Class 47-C where motivation was uniformly high. 

8. The intercorrelations can be accounted for almost completely 
by two factors. The rotated factors are (1) a combination of linguis- 
tic ability and general motivation and (2) quantitative reasoning. 

9. Item analysis of the tests revealed a trend in item validity 
closely related to increasing objectivity of material in the subject con- 
tinuum from English through social studies and natural science to 
mathematics. 

10. These tests have been recommended and adopted for use in 
selecting Air Force officers for assignment to study at civilian col- 
leges and universities, especially in view of their face validity, their 
low gradient with respect to amount of formal education of young Air 
Force officers, their substantial correlation with grades at the Air 
Tactical School, and their generally satisfactory reliability. 











PSYCHOMETRIKA—VOL. 14, NO. 1 
MARCH, 1949 


ON THE THEORY OF TEST DISCRIMINATION* 


GEORGE A. FERGUSON 
McGILL UNIVERSITY 


This paper discusses the properties of distributions of test 
scores and advances the view that the properties of the distribution 
should depend on the function which the test is intended to perform. 
A theory of test discrimination is developed which defines discrimi- 
natory capacity in terms of the number of relations of difference es- 
tablished by the operation of administering a test of k items to a 
sample of n individuals. A simple proof is presented which indicates 
that maximum discrimination between individuals is achieved when 
tests are constructed to yield distributions of the rectangular form. 
A coefficient of test discrimination is developed. The problem of ob- 
taining in practice distributions approximating to the rectangular 
form is briefly discussed. 


1. On The Properties of Distributions of Test Scores 

In the construction of a mental test the properties of the distri- 
bution of scores, obtained in samples of the population for which the 
test is intended, may within limits be predetermined by the selection 
of items of a certain difficulty and discriminatory capacity, these be- 
ing appropriately defined. The test maker in controlling the mean 
and variance of the distribution of scores on the final form of his 
test employs a knowledge of simple relationships. For any test of k 
items administered to 1 individuals the sample mean, #, is the sum 
of the sample difficulty values; that is, 


L==TDVi, (1) 


where p; is the proportion of individuals in the sample passing item 7. 
The sample variance, S?, is a function of the sample difficulty values 
and item interactions. 

S=Spi(l—pi) +23 (pis — Di ds), (2) 

i ii 

i<j 
where ;; is the proportion of individuals in the sample passing both 
items i and 7. Likewise the symmetry or asymmetry of the distribu- 
tion, its leptokurticity, normalcy, platykurticity or bimodality are 
functions of the item difficulty values and item interactions, or some 


*This paper was prepared under the auspices of the Defence Research Board, 
Ottawa, Canada. 


61 











62 PSYCHOMETRIKA 


defined index of item discriminatory capacity ae is in turn a func- 
tion of the item interactions. 

The control which the test maker is able to exercise over the 
resulting distribution of scores is limited by at least three circum- 
stances. First, the degree of inconsistency exhibited by the type of 
test material used will render impossible the attainment of certain 
types of distributions. All test material exhibits some inconsistency, 
some more than others. lu general the greater the internal consist- 
ency exhibited by the type of test material the greater the control 
which the test maker may exercise over the resulting distribution. 
With a highly consistent type of test material, markedly platykurtic 
and possibly bimodal distributions may be obtained in samples drawn 
from a defined population, while with other less consistent types of 
test material such distributions may not be attained at all in samples 
drawn from the same population. Second, the range of ability in the 
population for which the test is intended will affect the attainment 
of certain types of distributions of scores. In constructing tests for 
use on a population of children covering a broad age range, markedly 
bimodal distributions of scores may be obtained, while in construct- 
ing tests for use on a population covering a single year of age bimodal 
distributions may not be possible, and the degree of platykurticity 
may be fixed at a critical level. Third, the control which the test 
maker may exercise is obviously influenced by sampling considera- 
tions, and the circumstance that some test items when rearranged 
and included in a final form may function differently than they did in 
the preliminary draft. 

Since the test maker by the selection of appropriate items can 
predetermine within limits the properties of the resulting distribu- 
tion, it follows that a rational basis is required for a decision with 
regard to a distribution of a specified type in preference to distribu- 
tions of different types, within the range of types attainable. Thus 
the test maker must decide whether his purpose is best served by a 
leptokurtic, normal, or platykurtic distribution, a symmetrical or 
asymmetrical] distribution, and so on. Test makers not infrequently 
operate on the presumption that particular advantage attaches to 
distributions of scores approximating the normal form. In previous 
discussions of this subject [2, 4] I advanced the view that the prop- 
erties of the distributions should depend on the function which the 
test is intended to perform. If we are interested primarily in dis- 
crimination between two groups, rather than in discrimination be- 
tween the members of a single group, our purpose would be best 
served by the construction of a test yielding a bimodal distribution 
of scores with the point of partition at the abscissa of the minimal 














GEORGE A. FERGUSON 63 


ordinate between the two modes. The problem of discrimination be- 
‘tween groups occurs frequently, for example, in selection work where 
we are required to divide a group of applicants into those who may 
be expected to be suitable and those who may be expected to be un- 
suitable in a certain class of employment. If on the other hand our 
purpose is to discriminate between two groups and at the same time 
discriminate between the members of one group, and not between 
the members of the other, the test which would most efficiently per- 
form this function would yield some type of asymmetrical distribu- 
tion. In many situations, however, our primary concern is with es- 
tablishing an order with respect to an “ability” between the members 
of samples drawn from a given population, and our interest in indi- 
viduals at one level of “ability” is no greater than our interest in in- 
dividuals at other levels of “ability.”” Here the view was expressed [2, 
67-73] that this purpose would be most efficiently served by the con- 
struction of tests yielding distributions approximating the rectangu- 
lar form. 

While the arguments in favor of rectangular distributions may 
appear rather obvious to anyone familiar with theories of test con- 
struction, the logical presumptions involved in these arguments lack 
simplicity and clarity. My purpose here is to outline a simple theory 
of test discrimination, and to show that when our interest lies in dis- 
crimination between individuals, rather than between groups, maxi- 
mum discrimination is achieved with tests yielding rectangular dis- 
tributions of scores. 


2. A Theory of Test Discrimination 

All branches of science are concerned primarily with relations of 
difference rather than with relations of equivalence, and extensive 
effort has been devoted to the discovery and development of opera- 
tions which would identify properties with respect to which objects, 
persons, or events differed, to the functional relationships between 
such properties, and to their relevance to a variety of purposes. It 
may, indeed, be argued that all knowledge is a knowledge of relations 
in which difference is an element.* 

In the field of mental measurement we are concerned with rela- 
tions of difference between individuals with respect to certain proper- 
ties, usually termed “abilities.” These relations of difference take the 
form > or <. Statements with regard to how much greater or how 
much less cannot be made, at least where we maintain any respect for 


*This would appear to be substantially the epistemological position adopted 
by N. BR. Campbell [1]. 








64 PSYCHOMETRIKA 


logical rigor, because the operations performed are not in conformity 
“with the logical criteria of additivity. Fi 
The operation of administering a test of k items to a sample of n 
individuals establishes a relation between every item on the test, and 
every individual in the sample. If individual A passes item a the in- 
ference is drawn that A’s “ability” is > the “ability” required to pass 
item a. If A fails item b the inference is that A’s “ability is” < that 
required to pass item b. These relations constitute al] the informa- 
tion made available by the operation, and any conclusions drawn about 
‘| the individuals in the sample or about the items on the test are in- 
| ferred from these relations which are operationally determined. By 
assigning a 1 for a pass and a 0 for a failure we replace the relations 
> and < by the numerals 1 and 0, and by summation over the & 
items we obtain a “score,” « , for each individual. For any test of k 
items where 1 is assigned for a pass and 0 for a failure, scores are 
represented by the & + 1 integers over the range from 0 to k. Such 
scores serve the purpose of establishing a field of relations with re- 
spect to an “ability,” operationally defined, between the m individuals 


in the sample. If x; > x; or x; = x; or 7; < x;, we infer that the 
“ability” of i is greater than, equivalent to, or less than the “ability” 
| Pe 


Since the purpose of the operation of administering a test is to 
enable observation of relations of difference between individuals, 
rather than relations of equivalence (where our concern is with in- 
dividuals rather than group differences), it follows that the instru- 
ment used in the operation, the test, should be designed to maximize, 
for given values of » and k, the number of relations of difference, 
the > or < relations, and to minimize the number of relations of 
equivalence, the = relations. 





The number of different >, =, or < relations established by 
the administration of a test to a sample of ” individuals is 
$(n?—n) ; : (3) 


that is, the operation enables observation of a relation between every 
individual in the sample and every other individual. This may be 
written in the form 


+ 


re k+1 : ay 
=| «sfor—3K], pe @ 


where f; is the frequency obtaining a score value i, and the summa- 
tion extends over the k + 1 possible score values from 0 to k. 
The number of = relations is readily observed to be 


‘ 


nhs 














GEORGE A. FERGUSON 65 


1 k+1 k+1 
=| Bft—Eh |. (5) 
and the number of > or < relations is 
1p ee ket 1 * 
=| @for—Zre| (6) 


(5) plus (6) equals 4(n? — n) because the relations are of two kinds 
only, equivalence or difference. ‘“” 

Examination of (5) and (6) above indicates that for given val- 
ues of » and & the number of = relations is a minimum and the num- 
ber of > or < relations is a maximum when 


n 
k+1’ 


that is, when the distribution of scores takes the rectangular form. 

In a consideration of distributions which are symmetrical about 
the mean, the number of > or < relations is of course zero in the 
extreme case where all n individuals make the same score. The num- 
ber increases as leptokurticity decreases and the distribution ap- 
proaches the normal form. It continues to increase with increase in 
platykurticity until a maximum is reached with the rectangular form. 
The number of such relations will then decrease as bimodality is in- 
troduced. Any skewness in the distribution will reduce the number 
of > and < relations, the number of such relations being of course 
zero in the extreme case where all n individuals make either zero or 
perfect scores. 





fP=h=h=-=K= 


3. An Illustration 
The discussion above may be illustrated by reference to a fictiti- 
ous example. Table 1 shows the frequency distribution of four 7-item 
tests designed to yield four different types of distributions of scores, 
a leptokurtic, binomial, rectangular, and bimodal distribution. 
It will be observed that in the binomial case 6476 or 79.68%, of 
all relations established by the operation are relations of difference, 


*The value of the expression in (6) above, the number of relations of differ- 
ence, is of course not independent uf n. By dividing (6) by 2, and for conveni- 
ence discarding the 4 , we obtain the expression 1 — =f ,2/n?, which is independent 
of n. This expression is a measure of variability, although probably not a useful 
one. It is not entirely without interest to observe that it is invariant with respect 
to the ordering of the classes, and as such is more general than measures of vari- 


ability which involve the assignment of numerals to represent position in an or- \ 
dered series, or the introduction of a variate-scale. In a statistical theory based ia 


on relations only, this expression is the analogue of the variance. 


} 
: 
Pa 
3 
- 
5 
= 
: 
= 
: 














66 PSYCHOMETRIKA 


TABLE 1 
A Comparison of the Discriminatory Capacity of Four Tests Yielding Different 
Distributions of Scores 


(n= 128 , k=7) 






































(a) (b) (c) (d) (e) 
Leptokurtic Binomial Rectangular Bimodal 

0 0 1 16 5 
1 8 “f 16 10 
2 14 21 16 35 
3 42 35 16 14 
4 42 85 16 14 
5 14 21 16 35 
6 8 7 16 10 
7 0 1 16 5 
nN 128 128 128 128 

No. of = | 

relations, 1960 1652 960 1482 

formula (5) 

No. of > or < 

relations, 6168 6476 7168 6646 

formula (6) | 

Total no. of 

relations, 8128 8128 8128 8128 

formula (3) 

% of = relations 24.11% 20.82% 11.81% 18.23% 

% of > or < 

relations 75.89% 79.68% 88.19% 81.77% 

Coefficient 

of discrimination, .8604 .9035 1.0000 .9272 

formula (8) 














while in the rectangular case 7168, or 88.19%, a maximum, are rela- 
tions of difference. 


4. A Coefficient of Test Discrimination 
For any test of k items administered to a sample of n individuals, 
the number of relations of difference is given by the expression in (6) 


n 
above. When f, =f, =f. =:--= coe this expression is a maximum. 


and is equal to 





ieee [ - 


nl 











GEORGE A. FERGUSON 67 


We then define a Coefficient of Test Discrimination, 6, by dividing 
(6) the observed number of relations of difference, by (7), the maxi- 
mum number of such relations. 
k+1 
ite in 
6=———_-. (8) 
Nn? 
Oe saaeces laa 


k+1 


This coefficient will take values ranging from 0, when all individuals 
make perfect scores, to 1, when the number of relations of difference 
is a maximum, and the distribution of scores is rectangular in form. 
For purposes of illustration, values of 6 have been calculated for the 
four fictitious distributions in Table 1 and appear in the last row of 
that table. 





5. Concluding Observations 

In the present paper we have defined discriminatory capacity in 
terms of the number of relations of difference established by the opera- 
tion of administrating a test of k items to a sample of n individuals. 
We have demonstrated that the discriminatory capacity, defined in 
such terms, is a maximum when the distribution of scores is rectangu- 
lar in form. It follows, therefore, that where we are interested in 
obtaining maximum discrimination between individuals, items should 
be selected and tests constructed to yield rectangular distributions of 
scores in the population for which they are intended. Experience in 
the field of test construction suggests that in most populations and 
with test material which exhibits any reasonable degree of internal 
consistency, distributions approximating to the rectangular form can 
readily be obtained by the selection of appropriate items. Tests yield- 
ing distributions of scores approximating to the rectangular form 
were constructed for use in the Canadian Army during the war. Gen- 
erally as we restrict the range of item difficulty values, selecting a 
higher proportion of items with difficulty values in the neighbor- 
hood of .5 , we increase the platykurticity of our distribution, and by 
limiting the range sufficiently we can obtain in most situations ap- 
proximately rectangular distributions. In previous discussion [2, pp. 
67-70] I attempted to demonstrate that increased platykurticity, ob- 
tained by restricting the range of item difficulty values, resulted in 
increased inter-item covariance and increased reliability. These and 
related theorems were subsequently given a much more rigorous treat- 
ment by Gulliksen [3]. 

The position has been widely held in psychological measurement 
that tests should be constructed to yield distributions approximating 


RE RPE ARLIAS TE PGA TE a 9 82 98 ee 














68 PSYCHOMETRIKA 


the normal form. Two reasons have resulted in the adoption of 
this position. First, the vague belief exists that certain abilities 
are normally distributed in the population, a statement which is mean- 
ingless unless ability is given an operational definition. Second, a 
great body of statistical methed has been based on normal probability 
theory; consequently estimates of many statistical parameters are 
more meaningful when the distributions used in their computation 
are approximately normal. This second reason, however, lacks rele- 
vance in most situations. Tests are not constructed and used pri- 
marily to enable the calculation of statistics, but rather to make known 
certain relations between individuals for a variety of practical pur- 
poses. The assumption of normality in many arguments rests on a 
philosophical basis only and should not be invoked unless it can be 
shown to be both necessary and useful! in a particular context. 

To sum up, my position is that tests are instruments for perform- 
ing operations which enable the observation of relations of difference 
between individuals with respect to an “ability,” operationally defined. 
Tests should be constructed to maximize the efficiency of this opera- 
tion. If efficiency is viewed in terms of discriminatory capacity and 
discriminatory capacity in terms of relations of difference, it follows 
that maximum efficiency is obtained when tests are constructed to 
maximize the number of relations of difference, that is, to yield rec- 
tangular distributions of scores. The construction of tests to yield 
distributions approximating the normal form results in a loss of dis- 
criminatory capacity. 


REFERENCES 

1. Campbell, Norman Robert. An account of the principles of measurement and 
caleulation. London: Longman’s, Green and Co. Ltd. 1928. Pp. 293. 

2. Ferguson, George A. The reliability of mental tests. London: University of 
London Press, Ltd. 1941. Pp. 150. 

8. Gulliksen, Harold. The relation of item difficulty and inter-item correlation to 
test variance and reliability. Psychometrika, 1945, 10, 79-91. 

4, Jackson, R. W. B. and Ferguson, George A. A functional approach in test 
construction. Educational and Psychological Measurement, 19438, 3, 23-28. 














PSYCHOMETRIKA—VOL. 14, No. 1 
MARCH, 1949 


NOTE ON THE COMPUTATION OF PRODUCT-MOMENT 
CORRELATION COEFFICIENTS 


DOROTHY C. ADKINS 
THE UNIVERSITY OF NORTH CAROLINA 


This paper describes a systematic plan for computing all of the 
product-moment correlation coefficients among a number of vari- 
ables that has been taught by Professor Toops for many years. It 
offers several advantages over a scheme presented by Kossack in a 
recent issue of this journal. 


Recently this journal published an article by Kossack on the 
computation of zero-order product-moment correlation coefficients.* 
The present note redirects attention to a method which has been 
taught by Professor Toops for many years and which has several 
advantages over Kossack’s method. 


The Toops plan combines the two Kossack matrices into one, 
each cell having four entries, as illustrated.+ 

It is assumed that the values SX; , SX;?, and }X;X; are avail- 
able. It is also recommended that a variable X, be defined as the sum 
of the other variables and that its sum, sum of squares, and cross- 
products with each of its component variables be obtained.* This 
variable will provide a fool-proof summation check on computations. 
Rows and columns are labeled for each variable. The corresponding 
>X;’s are written above the column labels and below the row labels, 
as illustrated. 

The first entry in a cell is }X;X;, the 7 and 7 corresponding to 
the row and column intersecting at the cell. For diagonal cells, the 
entry is }X;?. 


*Kossack, Carl F. On the computation of zero-order correlation coefficients. 
Psychometrika, 1948, 13, 91-98. 


+For an early reference related to this plan, see Toops, Herbert A. Comput- 
ing intercorrelations of tests on the adding machine. J. appl. Psychol., 1922, 6, 
172-184. See also Toops, Herbert A. Some possibilities of statistical analysis ren- 
dered possible by recent applications of punched card and sorting equipment. 
Ohio College Association Bulletin No. 181, 1948, pp. 2508-2514. 


*Toops, Herbert A. Statistical checks on the accuracy of intercorrelation 
computations. Educ. res. Bull., 1927, 6, 385-391. 


69 


i 
4 
i 
a 
4 
a) 
2 














rT A/T IASI 7 
































- ae ("IN ay STA/T A/T “TA/TIA/T I 
TA/t **7 TA/I “7 STA/T 7 7 A/t I *TA/T 
887 sad | as ts 8 
oXZ "e'a= tyes ty Y 
G6 << a K¢ xX 
je /1 1h VA las | “IA/T sh Sa A/T A/T 7 MTA/T“TA/T™T 
wate: FY ame | eA Oe “YT A/T IT “A/T 7 ““TA/T 
aa | aay | zu tu u 
8. ,u Vary z J T ul XZ 
X"xz a XZ X"X< x°xXZ 4 
. D ” nanabuneaneuaeseneshcqteeincnsassarcanicur seneeccceccereecccnns shh She dkdhbadakekadsaecdecnusimicencc takes 1). ShtaeRRedneebasaicebabiacnebeimadcneat Ihe comibkiain asses atoe 
rs 
— 
=] 
& 
& ~ 
= 
°o 
Sie, Nea ee sa | A OR i at ta Mees, SR Oe Ec ee 
s ss A/ 2s 8Z uu 2Sp- es ee ee ee ; s ry oe os we ta . se im a A : N eee ee ee on oe 
a eee T LAM I —AA/T IAM | A/T =I A/T 7 “EASE RA SE ODT 
ST A/T te ST A/T wT Sr A/T rT Sr A/t 7 “A/T 
a | ""T rad 1% z 
8y*y “ast dca oa Solan ss 
—IA/1 WIA og SrA /t Ast Sy SA /T A/T OT TTA/T™TA/T 7 
A/T fe HT A/T ya aa pe TA/T 7 "MIA/T 
A . “= 4 a re | "xz 
Dae a xX AS x ae sa ty 
ty uy ty : “thas Ne en Ge Les eee 
i bb acd u z tyr | 
= rT A/T ““IA/T SrA, MTA, 
pea | =TA/T TA/T 




































































DOROTHY C. ADKINS 71 


After these basic entries have been made, the following checks 
of entries and computations must hold: 


1. 3X,=5 (SX;). 


The sum of the X, scores over the population must equal the 
sum, for all variables 1 through n, of the sum of the X; scores over 
the population. This check applies to the marginal entries above the 
column labels and below the row labels. 


2. SX.X, as (SX;X,). 


The sum of the cross-products X;X, over the population must 
equal the sum, for j7 varying from 1 through 7, of the sum of the 
X;,X; scores over the population. This check applies to the first cell 
entries in each row and in each column. The first cell entry in the 
first row of the X, column must equal the sum of all other first cell 
entries in that row, and so on. Similarly, the first cell entry in the 
first column of the X, row must equal the sum of all the other first 
cell entries in that column, and so on. 

N n N 

38. DX? =D (TX.Xi). 

i=1 

The sum of the X,? values over the population must equal the 
sum, for i varying from 1 through 7, of the sum of the X,X; values 
over the population. The first entry in the X, diagonal cell must equal 
the sum of all first cell entries in the X, row or in the X, column. 

N i=n,j-n WN 

4, SXP= SF (SXiX;). 

4-1,j=1 

A single over-all check is provided by the fact that the sum of 
the X,? values over the population must equal the sum, for both 7 
and 7 varying from 1 through 1, of the sum of the X;X; scores over 
the population. The first cell entry in the X, diagonal must equal the 
sum of all first cell entries in the table exclusive of those in the X, 
row and column. 

After these checks have been made, the first step is to compute 
systematically, row by row, the values N>XiX; — 5X: >X;, which 
are recorded as the second entries in the cells. These are obtained 
by multiplying the first cell entry by N and subtracting the product 
of the >X values for the row and column in question. For diagonal 
cells, these entries are NSX;? — (>Xi)*. Such a term is labeled D; 


fF he EBLE EE NAB SERERSE SEAS DER tet ne te Be pe me 








72 PSYCHOMETRIKA 


by Kossack and L;; by Toops, who extends the notation to refer to a 
nondiagonal entry as L;;.* If the entire table is filled in instead of 
just half, as Kossack directs, the entries can be checked by symmetry 
around the diagonal entries. Moreover, with the X, column available, 
the sum of all the second entries in each row exclusive of the X, 
entry must equal the second entry in the X, column of that row. 

The next step is to record across the top and down the left side 
the reciprocals of the square roots of the second entries in the diago- 


1 

nal cells, ———————————__,, or ———. The computation should be 
VNSX? — (SXi)? VLii 

checked at this point by multiplying L;; by 1/V/Li; 1/VLii , as for the 

third and fourth cell entries to be described below. 


The third entry in each cell is obtained by multiplying the value 








is a constant 








— for the row by the second cell entry. Thus —— 
VLii VLii 
multiplier for the row in question. Although the third entries in the 
cells will not check for symmetry around the diagonal, the summa- 
tion check will still apply. Thus the sum of row entries exclusive of 
the X, entry must equal the X, entry for the row. 
The fourth entry in each cell is obtained by multiplying the value 


is a constant 








— for the column by the third cell entry. Thus ——— 
VLii VLii 
multiplier for the column in question. These entries, which may be 

Li; 

—_—*—— for nondiagonal entries, are the product-mo- 
VLii VL; 
ment correlation coefficients sought, as a review of the steps will 
make clear. They should check for symmetry, and the diagonal en- 


written as 


Lis 
tries, which may be written as ——————-, should equal unity. 
VLii VLii 


*The L terms can be readily computed by calculating machine. =X;,X, is 
placed in the keyboard and multiplied by N. From this product, which should be 
left in the middle dial, the product of =X; and =X; is subtracted by use of the 
minus multiplier. The resulting number in the middle dial is L;,. 











DOROTHY C. ADKINS 73 


In summary, the successive entries are 





for non diagonal cells: for diagonal cells: 
1 SXiX; >X;? 
2 ND>XiX; — DXiX; = Li; N>Xi? — (Xi)? = Lii 
1 1 
e io no 
VLii VLii 
1 1 1 1 
Mg ee ee Ty ee a ee Hy ED 
V Lis V Lj; Vii VLii 


The advantages of this method over the Kossack scheme are: 
(1) It uses only one computing matrix. (2) The use of reciprocals 
of Li; values permits multiplication, first by row and then by col- 
umn, by a constant multiplier, which is easier than division and gives 
less error in decimals. (3) Provision for a symmetry check of the 
first, second, and fourth entries of each cell and the use of the sum- 
mation checks throughout make the computations fool-proof. 


ee ns 


i 
‘ 
j 
3 
* 
} 
; 
; 
. 














PSYCHOMETRIKA—VOL. 14, NO. 1 
MARCH, 1949 


BOOK REVIEWS eee at § 


VAUGHN, K. W. (Ed.) National Projects in Educational Measurement. A Re- 
port of the 1946 Invitational Conference on Testing Problems. American Coun- 
cil on Education Studies, Series I, No. 28, Vol. XI, Aug. 1947. Pp. 80. 


CHAUNCEY, HENRY (Ed.) Exploring Individual Differences... A Report of 
the 1947 Invitational Conference on Testing Problems. American Council on 
Education Studies, Series 1, No. 32, Vol. XII, Oct. 1948. Pp. 110., 


These two pamphlets contain the papers presented at the Invitational Con- 
ferences on Testing Problems held in New York City on November 2, 1946 and 
November 1, 1947 under the auspices of the Committee on Measurement and 
Guidance of the American Council on Education. Transcripts of the discussions 
at the end of each session are also presented. So many interesting ideas and such 
diverse points of view are represented in the two series of papers and in the com- 
ments of the discussants that a reviewer is at once tempted to write two more 
pamphlets. Perhaps all that can be expected of a review of a number of papers 
on different topics, however, is an over-all indication of their coverage and a 
notion of the high lights. 

Several of the papers of the 1946 conference bear directly on its theme, na- 
tional measurement projects. Thus E. F. Lindquist directs attention to our 
marked need for a comprehensive treatise on the theory, art, and techniques of 
examining and outlines the plan whereunder such a book is currently in process, 
with about 100 noteworthy authors and collaborators. K. W. Vaughn describes 
three projects of the Graduate Record Office: the Measurement and Guidance 
Project in Engineering Education, the Graduate Record Examination, and the 
Inquiry into Postwar Conditions in American Colleges. The first represents a co- 
operative approach to a measurement problem sponsored by the Engineers’ Coun- 
cil for Professional Development, the American Society for Engineering Educa- 
tion, and the Carnegie Foundation for the Advancement of Teaching. In the dis- 
cussion of the Graduate Record Examination, the problem of developing suitable 
tests for students exposed to varying curricula is explored and difficulties in 
validating tests of this type noted. The third project on postwar conditions in 
American colleges, modeled after the Pennsylvania Study, uses the Graduate Rec- 
ord Examination and thus serves the secondary purpose of providing national 
norms. 

Arthur E. Traxler discussed a project on development and use of: testing pro- 
cedures in the field of accounting. This project, initiated by the American Insti- 
tute of Accountants, is being worked on by a committee appointed by this business 
group, representatives of Columbia University, and the Educational Records Bu- 
reau. It thus offers an interesting example of cooperation between business and 
educational groups. Howard R. Anderson reviews the objectives and -scopes of 
the 1946 Cooperative Nationwide High-School Testing Program, which entailed 
the administration of the 1946 Cooperative Test of Recent Social and Scientific 
Developments to some 143,000 high-school pupils in 43 states. He'deseribes the 


75 


EM DB MED ROR BB M9 EE LO RP es ET 


a 


ee 








rae 


wwe S 





76 PSYCHOMETRIKA 


data obtained and offers recommendations to facilitate attainment of the objec- 
tives of the program. 

In a paper of somewhat broader scope on “Units and Norms in Educational 
Measurement,” John C. Flanagan recognizes theoretical limitations of current 
test scores but stresses their practical usefulness. He summarizes several types 
of derived scores—mainly modifications of standard or normalized standard scores. 
The present trend, as he sees it, is in the direction of norms more immediately 
related to requirements for effective performance in various types of individual 
and social activities. 

Two other papers which transcend specific applications of testing procedures 
will probably .be of more general interest to readers of this journal. In a chal- 
lenging article on “Validity of Educational Tests,” Phillip J. Rulon gives a clear 
statement of his point of view that the basic criterion of the validity of a new 
test springs from the “direct operations-upon-material” approach. Consequences 
of this point of departure as it affects the practice of educational achievement 
testing are carefully explored. The test that calls for processes and materials 
directly related to teaching objectives needs no further establishment of validity. 
A test that. uses different processes and materials requires evidence that it is re- 
lated to the abilities constituting the aim. 

Robert. L. Thorndike, speaking on “Logical Dilemmas in the Estimation of 
Reliability,” asks us to take a fresh look at the basic postulates of reliability 
estimation. Contrasting the statistical and experimental points of view, in this 
paper he stresses the logic of available experimental choices, with their impli- 
cations. Reliability is considered as having to do with the precision of measure- 
ment resulting from observing and quantifying a sample of behavior. Exploring 
the ramifications of this definition, Thorndike arrives at the position that the 
relationship between scores on comparable forms of a test administered with an 
intervening period of time is superior to the test-retest method or to any method 
based on a single administration of a single form if the purpose is to make gen- 
eralizations beyond the sample of material in one form or from one time to an- 
other. 

The 1947 Conference was divided into two parts with four papers each on 
quite different themes, “Projective Techniques and Their Validity” and “Differen- 
tial Prediction.” In many respects the most stimulating of these papers is Per- 
cival M. Symonds’ “Survey of Projective Techniques.” He contrasts projective 
techniques, which call for the subject to express some fantasy and attribute cer- 
tain characteristics to the fantasy object, with expressive techniques, which call 
for an expression of the subject’s skill and habit tendencies. The former, wherein 
the significance lies in content, is typified by the Thematic Apperception Test; 
the latter, wherein the heart of the interpretation is the formal characteristics of 
the response, by the Rorschach. On the basis of the limited evidence available, 
Symonds concludes that expressive techniques have some diagnostic value, as 
partially evidenced by correspondence between blind interpretations and inde- 
pendent diagnoses, but little predictive value. On the other hand, he cites his 
own studies as well as others in support of the generalization that projective 
techniques have little if any value either for diagnosis or prediction. Abandon- 
ing the psychometric approach to both expressive and projective techniques, how- 
ever, Symonds finds their real significance in a “descriptive and analytical ap- 
proach.” This position may leave some readers wondering, first, what one does 
with a diagnosis from which some type of prediction cannot be made and, second, 
what value attaches to insight gained through a descriptive and analytical ap- 











BOOK REVIEWS 77 


proach if it cannot in some way be used for either diagnosis or prediction. 

Eugenia Hanfmann describes and evaluates the use of projective (or per- 
haps expressive?) techniques in the wartime selection of candidates for overseas 
service with the Office of Strategic Services. This assessment program tried out 
various projective techniques. Some, including variations of the TAT and Ror- 
schach, were abandoned or applied only to special cases because gains were dis- 
proportionate to time expended. The two projective techniques found most useful 
were a sentence completion test and “improvisations,” a variation’ of psycho- 
drama. The speaker believed their value came from the fact that they are aimed 
at eliciting material closer to the level of manifest attitudes or behavior than that 
obtained through the Rorschach or the TAT. 

A different type of application of a projective technique is described in 
Ruth L. Munroe’s paper on the relation of academic success at Sarah Lawrence 
to ACE scores and to a measure of personal adjustment called the “Inspection 
Rorschach.” The score is based on a prescribed check list of items, which repre- 
sent a sort of distillation of reported experience with the Rorschach and the 
speaker’s personal training and experience. This measure showed positive asso- 
ciation with academic standing and, combined with ACE scores, improved the 
prediction of scholarship. In view of the results obtained, some of the cautions 
with which the speaker surrounds her interpretations, such as the defense of the 
position that an “over-all empirical criterion of success in a given situation de- 
feats its own end,” seems unnecessary. As a matter of fact, the measure which 
was shown to be positively related to the personal adjustment score, academic 
standing, is an over-all empirical criterion of success! To be sure, the adjustment 
score does not predict it perfectly any more than does an aptitude test. 

The most striking point in Thomas M. Harris’ paper on the use of projective 
techniques in industrial selection relates to the success of a single projective test, 
an ink-blot series called the ITP, in predicting success or failure in jobs that re- 
quire specialized talents. We are told little about the test itself or its interpre- 
tation except that it is quite far removed from the Rorschach. Harris cites a dra- 
matic instance of the success of the ITP test in selecting creative chemists, the 
criterion being an unusually objective one based on patent records. Somewhat to 
our astonishment, we are indirectly led to believe that the same test, presumably 
scored in different ways, is useful for selecting engineers, production managers, 
salesmen, research workers, supervisors, and executives. 

The second half of the 1947 Conference had differential prediction as its 
theme. Ledyard R. Tucker presented some theoretical considerations bearing on 
the problems of differential criteria. To answer the question of the field in which 
a person is most likely to succeed, Tucker proposes analyses of related criteria 
and concentration on those elements distinctive of the several fields. He outlines 
a procedure calling for collection of data on all established aptitudes (possibly 
along with personality and interest traits), a job analysis of criterion fields, ob- 
taining measures of proficiency in the various tasks for each field, performing a 
factor analysis for each field, and then applying the results. It will be a long 
time before this job is done. 

George K. Bennett and Harold Seashore discuss several of the practical and 
technical problems encountered in development of the Psychological Corpora- 
tion’s Differential Aptitude Tests. Among other things, they eonsider the nature 
of the tests to be included, the actual form of the tests, the manner in which scores 
should be reported, and difficulties in obtaining adequate norms and validation 
data. 








78 PSYCHOMETRIKA 


Henry S:.Dyer makes clear the apprehensions that beset an investigator of 
the differential prediction of college success. He discusses various explanations 
for the fact that results up to now have not been very encouraging but contends 
that, if we are to continue to offer differential guidance, we must accept as a 
working hypothesis that criteria are differentiable and sufficiently reliable for 
prediction purposes. He describes a recently begun project at Harvard (with the 
cooperation of College Entrance Examination Board staff), which is being fo- 
cused on attempts to discover which of many particular observations and their 
combinations are most predictive of specific types of college achievement. This 
study should go a long way toward telling us whether differential prediction of 
college success is feasible. 

Benjamin Bloom reports on the first four years of an investigation by a 
committee at the University of Chicago of various types of outcomes of the edu- 
cational program of the College. Several subcommittees have developed plans for 
collection of several types of data. One of the major subcommittees has dealt 
with the use of tests in studying how students change in the ways indicated by 
the objectives of the College. A comprehensive outline of these objectives, in 
terms of knowledge, skills, and abilities, was first formulated in cooperation with 
faculty groups, and tests were selected or designed to test these objectives. Bloom 
offers certain generalizations based on the data on 4000 students tested up to the 
time of the report. He hopes that such a study can be reconstituted as a coopera- 
tive venture with other educational institutions. 

No attempt will be made to review the discussion, to which a total of about 
28 pages is devoted in the two pamphlets, except to mention that it is very dis- 
cursive. Perhaps more liberal editing and deletion would not have been amiss. 


The University of North Carolina DorotTHy C. ADKINS 








ae) 


